GAO 


United  States 

General  Accounting  Office 

Washington,  D.C.  20548 


Program  Evaluation  and 
Methodology  Division 


B-239914 
October  16, 1990 

The  Honorable  Richard-B.  Cheney 
The  Secretary  of  Defense 


Dear  Mr.  Secretary: 


Accession  For 

NTI3  GRAM 
DTIC  TAB 
Unannounced 
Justification. 


o _ 

□ 


By - 

Distribution/ 


Availability  Codes 


Avail  and/or 

Dlst 

Special 

l 

r’ 

i 

In  this  report,  we  review  the  information  sources  on  which  the  services  base  their 
evaluations  of  the  effectiveness  of  their  technical  training  programs,  recruit  selection,  and 
classification  decisions.  We  undertook  this  review  because  the  technical  sophistication  of 
modern  weaponry  has  intensified  the  need  for  well-qualified  recruits  and  effective  technical 
training.  This  report  identifies  some  critical  gaps  in  the  services’  ability  to  measure  how 
effectively  they  are  selecting  and  preparing  recruits  to  use  and  maintain  today’s  complex 
weapons  systems. 


This  report  contains  recommendations  in  Chapter  5.  The  head  of  a  federal  agency  is  required 
by  31  U.S.C.  720  to  submit  a  written  statement  on  actions  taken  on  these  recommendations  to 
the  Senate  Committee  on  Governmental  Affairs  and  the  House  Committee  on  Government 
Operations  not  later  than  60  days  after  the  date  of  the  report  and  to  the  House  and  Senate 
Committees  on  Appropriations  with  the  agency’s  first  request  for  appropriations  made  more 
than  60  days  after  the  date  of  the  report. 


We  are  sending  copies  of  this  report  to  appropriate  House  and  Senate  committees,  members 
of  Congress  from  the  states  mentioned  in  the  report,  and  the  Director  of  the  Office  of 
Management  and  Budget.  We  will  also  make  copies  available  to  interested  organizations,  as 
appropriate,  and  to  others  upon  request. 


If  you  have  any  questions  or  would  like  additional  information,  please  call  me  at  (202)  275- 
1854.  Major  contributors  to  the  report  are  listed  in  appendix  VI. 

Sincerely  yours, 
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Purpose 


Background- 


s- . U _ 

— ■'The  ability  of  the  armed  jforces  to  carry  out  their  mission  into  the  next 
cfentury  will  depend  on  both  hardware  and  personnel  considerations:  the 
reliability  and  appropriateness  of  weapons  systems,  the  quality  of  mili¬ 
tary  personnel,  and  the  Jfit’*  of  human  skills  to  the  operating  demands 
of  weapons  systems.  If  the  entry-level  aptitude,  knowledge,  and  skills  of 
new  recruits  should  fall  short  of  the  human  requirements  needed  to 
operate  and  maintain  new  technologically  sophisticated  systems,  greater 
demands  would  be  placed  on  the  armed  services  to  compensate  for  the 
shortfall  through  training. -The  purpose  of  this  report  was  to  examine 
the  information  collected  by  the  Department  of  Defense  (dod)  on  both 
the  quality.of.  its  new  recruits  and  the  effectiveness  of  its  training  in 
preparing  recruits  to  operate  in  a  technologically  sophisticated 
environment.  N 


A  recruit  is  admitted  to  military  service  and  assigned  to  an  occupational 
specialty  on  the  basis  of  tests  taken  at  recruitment.  Upon  completion  of 
basic  training,  most  recruits  receive  additional  classroom  training  in 
their  specialty  and  then  are  assigned  to  perform  the  specialty  in  the 
field.  This  typical  sequence  encompasses  the  three  points  in  a  recruit’s 
service  career  where  data  critical  to  evaluating  the  success  of  training 
must  be  collected:.at  entrance  to  military  life,  during  and  upon  comple¬ 
tion  of  formal  training,  and  after  assignment  to  a  military  specialty  in 
the  field.  ^  j  A  / 

An  adequate  system  of  assessing  training  effectiveness  must  include 
reliable  and  valid  information  at  each  of  these  points,  and  should 
examine  the  interrelationships  among  these  data  points  to  test  the  con¬ 
gruence  of  initial  selection  and  placement  data,  classroom  measures,  and 
the  ultimate  criterion — field  performance. 


During  the  mid-1980’s,  the  services  reported  dramatic  improvements  in 
the  general  qualifications  of  new  recruits.  The  improvements  were 
attributed  to  better  compensation  and  educational  benefits,  increased 
recruiting  efforts,  and  heightened  public  appreciation  of  the  military 
role.  These  reports  did  not,  however,  address  the  specific  area  of  tech¬ 
nical  qualifications  among  recruits.  More  recently,  the  services  have 
reported  difficulty  in  filling  their  quotas  with  highly  qualified  recruits. 
This  perceived  decline  in  the  ability  levels  of  recruits  entering  training 
raises  questions  about  the  reality  of  that  decline,  about  its  magnitude, 
about  the  effectiveness  of  the  process  by  which  recruits  are  selected  for 
training,  and  about  the  actual  on-the-job  performance  of  those  recruits. 
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Results  in  Brief 


gao  found  that  the  aptitude  level  of  recruits  did  increase  during  the 
1980’s  but  that  most  of  the  improvement  occurred  during  the  first  half 
of  the  decade.  Since  then,  little  change  has  occurred  in  general  aptitude 
for  training,  but  the  levels  of  some  of  the  more  technical  skills  have 
declined  among  recruits,  in  one  case  below  the  1981  level.  Women  and 
members  of  minority  groups  consistently  scored  lower  in  tests  used  to 
assign  recruits  to  more  technical  occupational  specialties  such  as  radar 
specialist  positions. 

gao  concluded  that,  for  most  recruits,  the  services’  selection  criteria  are 
moderately  successful  at  predicting  individual  performance  during 
classroom  technical  training.  However,  they  are  notably  less  successful 
for  women  and  minority  recruits. 

Each  service  has  evaluation  mechanisms  in  place,  but  only  the  Army 
systematically  collects  data  on  the  field  performance  of  individual  grad¬ 
uates  in  a  way  that  would  allow  comparison  of  a  graduate’s  on-the-job 
performance  with  his  or  her  entry-level  ability  and  classroom  perform¬ 
ance.  These  data  reveal  an  even  weaker  connection  for  women  and 
minority  group  members  between  criteria  used  to  assign  them  to  tech¬ 
nical  specialties  and  their  later  field  performance.  The  field  evaluation 
practices  of  the  Navy  are  particularly  fragmented  and  have  deteriorated 
during  the  1980’s,  gao  found  that  the  lack  of  reliable  field  performance 
data  in  the  Navy  and  the  Air  Force  makes  realistic  assessment  of 
training  effectiveness  impossible. 

gao  concluded  that  the  insensitivity  of  selection  and  placement  mea¬ 
sures  as  predictors  of  future  success  for  female  and  minority  recruits  is 
a  matter  of  serious  concern  in  view  of  the  military’s  increasing  reliance 
on  these  groups  to  perform  technical  roles. 


Principal  Findings 


Recent  Quality  Trends  All  services  administer  the  Armed  Services  Vocational  Aptitude  Battery 

(asvais)  to  new  recruits.  The  primary  measure  of  a  recruit’s  aptitude  is 
the  Armed  Forces  Qualification  Test  (akqt),  which  is  made  up  of  four 
asvais  subtests,  akqt  scores  have  tended  to  level  off  after  rising  in  the 
early  1980’s.  Average  scores  on  three  of  the  four  subtests  used  to  select 
candidates  for  technical  training  have  declined  since  mid-decade,  and 
scores  on  one— the  Electronics  Information  subtest — are  lower  than  in 
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1981.  A  smaller  percentage  of  recruits  now  qualify  for  the  most 
demanding  technical  specialties  than  at  any  time  since  1981.  Women  and 
minority  group  members  are  severely  underrepresented  among  quali¬ 
fiers  because  they  score  lower,  on  average,  than  white  males.  (See  pages 
18-31.) 


Classroom  Evaluation 
Measures 


Each  service  has  established  evaluation  mechanisms  to  monitor  instruc¬ 
tional  quality  and  curriculum  coverage  in  classroom  training.  Overall, 
the  grading  procedures  in  the  courses  gao  reviewed  appeared  to  discrim¬ 
inate  acceptably  well  among  levels  of  student  performance  (with  the 
exception  of  some  Army  courses  where  recorded  grades  were  unreliable 
indicators  of  classroom  performance).  (See  pages  32-34, 36-38,  and  40- 
41.) 

Selection  criteria  from  asvab  are  moderately  successful  in  predicting  the 
performance  of  most  students  for  training,  but  are  significantly  less  reli¬ 
able  predictors  for  women  and  minority  students.  While  these  groups 
appeared  to  overcome  their  lower  scores  on  aptitude  measures  in  the 
Navy  and  Air  Force  courses  reviewed,  the  differences  in  classroom  per¬ 
formance  for  nonwhite  and  female  students  persisted  throughout  the 
Army  technical  courses  reviewed.  (See  pages  34-36, 38-39,  and  40-41.) 

gao  developed  a  statistically  more  sophisticated  summary  score  from 
asvab  using  factor  analysis.  This  factor  score  generally  performed  better 
than  afqt  and  the  Electronics  Composite  score  in  predicting  final  grades 
for  all  demographic  groupings.  This  finding  suggests  that  broader-based 
selection  criteria  than  those  currently  in  use  could  be  more  reliable 
predictors  of  classroom  performance,  at  least  in  the  technical  areas  gao 
reviewed.  (See  pages  36, 39,  and  41.) 


Field  Measures  of  Training 
Effectiveness 


The  Army’s  Skill  Qualification  Test  provides  the  only  objective,  system¬ 
atically  collected  estimates  of  the  field  performance  of  individual  gradu¬ 
ates  of  training.  The  Air  Force  and  the  Navy  rely  instead  largely  on 
feedback  mechanisms  through  which  field  commanders  and  supervisors 
may  submit  complaints  to  the  training  community  if  they  believe  their 
graduates  have  been  inadequately  trained.  In  addition,  Air  Force  evalua¬ 
tion  units  periodically  survey  a  sample  of  supervisors  of, course  gradu¬ 
ates  for  their  perceptions  of  the  quality  and  appropriateness  of  training. 
A  similar  practice  was  followed  in  the  Navy  until  the  mid-1980’s. 
Internal  reports  have  been  sharply  critical  of  the  quality  of  the  Navy’s 
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training  assessment  procedures,  but  these  deficiencies  are  only  slowly 
being  corrected.  (See  pages  45-50.) 

Field  performance  measures  have  been  developed  by  dod  under  the 
Joint-Service  Job  Performance  Measurement  project  and  may  be  appli¬ 
cable  to  training  assessment  purposes.  (See  page  51.) 

asvab  scores  in  our  sample  are  weaker  predictors  of  field  performance  as 
measured  by  the  Army  than  they  are  of  classroom  performance  and 
only  predict  well  for  white  male  recruits.  The  factor  scores  developed  by 
gao  are  better  predictors  than  either  afqt  or  the  Electronics  qualifying 
scores  used  by  the  Army.  No  asvab  score  was  significantly  correlated 
with  field  performance  for  women  or  minority  soldiers.  (See  pages  45- 
46.) 


Recommendations 


gao  believes  that  evaluating  the  effectiveness  of  the  training  provided 
by  the  services  is  crucial  if  they  are  to  meet  the  future  challenges  of 
changing  demographics  and  increasingly  sophisticated  weaponry.  Gao 
therefore  recommends  that  the  Assistant  Secretary  of  Defense  for  Force 
Management  and  Personnel  attempt  to  develop  more  sensitive  indicators 
of  classroom  and  field  performance  in  technical  specialties  for  women 
and  minority  recruits  from  extant  data,  gao  also  recommends  that  the 
Assistant  Secretary  review  alternative  measures  of  field  performance 
already  developed  by  the  services  under  the  Job  Performance  Measure¬ 
ment  project  for  their  applicability  to  training  and  on-the-job  perform¬ 
ance  evaluation,  gao  further  recommends  that  the  Secretary  of  the 
Army  direct  the  Training  and  Doctrine  Command  to  review  for  accu¬ 
racy,  appropriateness,  and  reliability  the  classroom  grading  procedures 
identified  within  the  report  as  deficient.  Finally,  gao  recommends  that 
the  Secretary  of  the  Navy  establish  a  firm  deadline  for  developing  a 
training  evaluation  program  and  that  he  direct  that  current  resources 
allocated  to  this  effort  be  reexamined  for  their  adequacy. 


Agency  Comments 


In  a  written  response  to  a  draft  of  this  report,  dod  concurred  with  all  of 
its  recommendations  and  identified  specific  actions  to  be  taken  toward 
implementing  them,  dod  also  concurred  or  partially  concurred  with  what 
it  identified  as  the  main  findings  contained  in  the  report.  (See  appendix 
V.)  We  have  reviewed  these  comments  and,  where  appropriate,  have 
made  changes  to  the  text. 
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Chapter  1  _ 

Introduction 


The  ability  of  the  armed  forces  to  carry  out  their  mission  into  the  next 
century  will  depend  on  both  hardware  and  personnel  considerations:  the 
reliability  and  appropriateness  of  weapons  systems,  the  quality  of  mili¬ 
tary  personnel,  and  the  “fit”  of  human  skills  to  the  operating  demands 
of  .weapons  systems.  If  the  entry  level  aptitude,  knowledge,  and  skills  of 
new  recruits  should  fall  short  of  the  human  requirements  needed  to 
operate  and  maintain  new  technologically  sophisticated  weapons  sys¬ 
tems,  greater  demands  would  be  placed  on  the  armed  services  to  com¬ 
pensate  for  the  shortfall  through  training.  In  this  report,  we  will 
examine  the  information  collected  by  dod  on  both  the  quality  of  its  new 
recruits  and  the  effectiveness  of  its  training  in  preparing  recruits  to 
operate  in  a  technologically  sophisticated  military  environment. 


In  hearings  before  the  House  Appropriations  Committee  on  the  fiscal 
year  1988  budget  for  dod,  the  Assistant  Secretary  for  Force  Manage¬ 
ment  and  Personnel  characterized  the  changes  since  1980  in  the  nation’s 
armed  forces  in  these  words:  “Today  we  are  recruiting  the  highest 
quality  personnel  in  history.  [The  services’  personnel  possess). . .  high 
intelligence,  correct  experience  mix,  [and]  high  skill  levels.”  The  reasons 
cited  for  this  “most  remarkable  turnaround  in  peacetime  history”  were 
many:  higher  pay  and  improved  quality  of  life  for  members  of  the 
armed  forces;  the  recession  and  consequent  unemployment  of  the  early 
1980’s,  which  widened  the  pool  of  applicants;  improved  educational 
benefits  for  military  service;  more  intensive  and  effective  recruiting; 
and  recovery  from  the  poor  public  perception  of  the  military  following 
the  war  in  Vietnam. 

The  statistics  cited  by  dod  supported  this  favorable  view.  In  1980, 68 
percent  of  recruits  were  high  school  graduates  (versus  75  percent  for 
the  youth  population  in  general).  By  1986, 92  percent  of  recruits  had 
high  school  diplomas.  Whereas  65  percent  of  recruits  in  1980  scored  in 
tiio  top  three  mental  categories  on  the  Armed  Forces  Qualification  Test 
(versus  69  percent  for  the  norm  group),  in  1986, 96  percent  achieved 
this  level. 

Yet  the  demographic  and  educational  realities  of  the  immediate  future 
are  likely  to  affect  this  optimistic  scenario.  The  number  of  young  people 
available  for  the  military  recruit  pool  will  continue  to  diminish  until  the 


Recruit  Quality  in  the 
1980’s 
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mid-1990’s.‘  The  composition  of  the  recruit  pool  will  also  shift. 
According  to  research  sponsored  by  the  Department  of  Labor,  by  the 
year  2000  five  of  every  six  new  labor  force  entrants  will  be  female, 
minority  group  members,  or  immigrants.2  Meanwhile,  the  graduates  of 
the  American  educational  system  are  said  to  be  falling  further  behind 
the  youth  of  competitor  nations  in  technological  literacy  at  the  same 
time  that  U.S.  weapons  systems  are  becoming  increasingly 
sophisticated.3 

dod  has  also  begun  to  voice  concern.  Hints  of  uneasiness  emerged  in  the 
fiscal  year  1988  appropriations  hearings  when  the  Air  Force  reported 
increased  difficulty  in  securing  quality  recruits.  In  the  same  hearings, 
the  Navy  expressed  its  concern  over  the  steady  erosion  of  its  Delayed 
Entry  Pool — the  program  under  which  applicants  agree  to  enter  the  ser¬ 
vice  within  a  year.  In  addition,  for  the  first  time  in  eight  years,  the 
Army  failed  to  meet  its  quarterly  recruiting  quota  in  the  first  quarter  of 
fiscal  year  1989. 


Recruit  Training 


Figure  1.1  identifies  the  typical  sequence  that  occurs  during  the  early 
stages  of  a  recruit’s  time  in  the  military.  As  shown,  after  their  basic 
training — the  length  and  content  of  which  varies  by  service — most 
recruits  attend  additional  training  to  equip  them  to  function  effectively 
in  some  occupational  specialty.  The  recruit’s  area  of  specialization  is 
determined  by  service  needs,  qualifications  ;is  determined  on  tests 
administered  during  the  recruiting  process,  and  individual  interests. 


'U.S.  Bureau  of  (lie  Census,  Projections  of  (he  Population  of  the  United  Slates,  by  Age,  Sex,  and  Race 
1988  to20S0. Current  Population  lieports,  Series  P-25.  No.  1(118 (Washington,  DC.. US  Government 
I’nnting  Office,  1989),  p.  6. 

2William  B.  Johnston  and  Arnold  II.  Packer,  Workforce  2000:  Work  ami  Workers  for  the  21st  Century 
(Indianapolis,  Indiana:  Hudson  Institute,  1 987 ),  p  9j.  Sec  also  US,  Office  of  Personnel  Management, 
Civil  Service  2000  (Washington,  D  C.:  US.  Government  Pnntiiut  Office,  1988) 

•’.Martin  Bulkin,  Military  Technology  and  Defense  Manpower  (Washington,  D.C.:  The  Brookings  Insti¬ 
tution,  198G).  Sec  also  Aerospace  Education  Foundation,  America's  Next  Crisis.  The  Shortfall  in  Tech¬ 
nical  Manpower  (Arlington,  Va.:  The  Aerospace  Education  Foundation,  11189);  and  National  Itcscarch 
Council,  A  Challenge  in  Numbers-  People  in  the  Mathematical  Sciences  (Washington,  D.C.:  National 
Academy  of  Sciences,  19!)6). 
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The  training  curriculum  for  each  occupational  specialty  is  designed 
through  a  structured  set  of  procedures  called  Instructional  System 
Development  (isd)  that  draws  heavily  on  the  work  by  Tyler  and  others 
on  the  behavioral  objectives  of  instruction.4  The  isd  model  consists  of  the 
following  five  steps: 

1.  Determine  job  requirements  through  detailed  analysis  of  tasks  per¬ 
formed  in  an  occupational  specialty. 

2.  Determine  type  of  instruction  (formal  classroom,  on-the-job,  or  other) 
that  best  suits  the  student  population  and  task  requirements. 


4Scc,  for  example,  U.W.  Tyler,  Basic  Principles  of  Curriculum  ami  Instruction  (Clilcat  •  University  of 
Chicago  Press.  1030);  and  It.  W.  Ty  ter.  It.M.  Gagne,  anil  M.  Scriven,  Perspectives  of  Co,  riculum  Evalu¬ 
ation  (Chicago:  Itand  McNally.  1007). 
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Objectives,  Scope,  and 
Methodology 


3.  Develop  objectives  that  specify  the  desired  behaviors,  the  conditions 
under  which  they  are  to  be  demonstrated,  and  an  acceptable  standard  of 
performance. 

4.  Plan  and  develop  instructional  methods,  media,  and  equipment. 

5.  Conduct  and  evaluate  instruction. 

A  student’s  progress  through  an  iSD-developed  curriculum  is  measured 
by  criterion-referenced  tests  at  the  end  of  each  block  of  training.  A  stu¬ 
dent  passes  the  course  after  he  or  she  has  performed  each  task  identi¬ 
fied  as  a  job  requirement  at  the  level  of  competency  defined  as 
acceptable.  Continuous  monitoring  of  job  requirements  is  needed  to 
assure  that  course  objectives  remain  relevant. 

Upon  successful  completion  of  classroom  training  in  the  occupational 
specialty,  the  recruit  is  ready  for  assignment  in  the  field  to  carry  out  the 
duties  requiring  the  skills  acquired  during  training.  Formal  training  is 
now  complemented  by  the  necessary  on-the-job  training  to  permit  the 
recruit  to  function  as  part  of  a  unit  with  a  defined  mission  in  a  real- 
world  setting. 


The  purpose  of  our  study  is  twofold:  to  profile  the  aptitudes  of  the 
recruits  who  entered  the  service  from  1981  to  1989,  and  to  evaluate  the 
military  service’s  ability  to  select  successful  trainees  and  to  assess  their 
training  and  work  performance.  We  will  examine  the  three  points  in  a 
recruit's  service  career  where  data  critical  to  performing  a  thorough 
evaluation  of  training  must  be  collected:  (1)  at  entrance  to  military  life, 
prior  to  assignment  to  an  occupational  specialty;  (2)  during  training, 
when  the  recruit’s  mastery  of  the  specialty’s  basics  is  assessed;  and  (3) 
after  assignment  to  the  field,  where  what  was  learned  in  the  classroom 
must  be  applied  in  the  work  environment.  (See  figure  1.2.) 
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Figure  1.2:  Data  Sources  and  Comparisons 


Comparisons  Test  the 
Effectiveness  of  Selection 
Procedures 


Comparisons  Test  the 
Effectiveness  of 
Classroom  Training 


The  evaluation  model  underlying  our  review  assumes  the  need  to  inter¬ 
relate  these  three  points.  Comparing  the  information  collected  at  points 
1  and  2  can  provide  some  insight  into  the  ability  of  the  services  to  pre¬ 
dict  how  well  recruits  will  perform  in  training  on  the  basis  of  their 
scores  in  qualifying  tests.  The  strength  of  the  relationship  between 
points  2  and  3  is  a  partial  measure  of  the  validity  and  effectiveness  of 
training.  Finally,  the  relationship  between  points  1  and  3  is  an  estimate 
of  the  effectiveness  of  the  services’  selection  and  training  procedures. 

The  model  is,  of  course,  simplistic  and  in  need  of  considerable  expan¬ 
sion.  A  fully  detailed  model  would  have  to  consider  other  influences  on 
performance,  such  as  on-the-job  experiences,  and  would  need  to  be  able 
to  determine  the  location  of  a  problem  if  relationships  between  the  three 
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points  were  weaker  than  anticipated.  Yet,  the  model,  at  whatever  level 
of  sophistication,  would  at  a  minimum  require  data  at  these  three  crit¬ 
ical  points  in  a  recruit’s  service  career. 

We  reviewed  the  information  collection  practices  of  each  service  at  the 
three  points  identified  in  the  model.  For  a  selected  number  of  occupa¬ 
tional  specialties — our  focus  is  on  training  for  the  more  technical  occu¬ 
pational  specialties — we  reviewed  the  data  that  have  been  collected  for 
insights  they  provide  into  the  service’s  selection  and  evaluation  proce¬ 
dures,  particularly  as  they  affect  women  and  minority  groups. 

Our  study  is  organized  around  three  evaluation  questions,  each  corre¬ 
sponding  to  one  of  the  model  data  points.  Each  question  is  addressed  in 
a  separate  chapter. 

1.  How  has  the  aptitude  of  recruits  for  technologically  sophisticated  spe¬ 
cialties  changed  since  1980? 

dod  tracks  recruit  aptitude  according  to  four  broad  mental  categories 
based  on  the  scores  on  the  Armed  Forces  Qualification  Test  (afqt).  (See 
table  1.1.)  afqt  is  a  composite  of  four  of  the  ten  tests  from  the  Armed 
Services  Vocational  Aptitude  Battery  (asvab)  administered  to  every 
potential  recruit.  We  examined  some  other  components  of  asvab  in 
greater  detail,  particularly  those  subtests  that  are  used  to  qualify  candi¬ 
dates  for  high  technology  occupational  specialties. 


Categorized 

AFQT  category 

AFQT  percentile 
score 

Trainability 

1 

93-99 

Well  above  average 

II 

65-92 

Above  average 

iilA 

50-64 

Average 

1118 

31-49 

Average 

iv 

10-30 

Below  average 

V" 

1-9 

Well  below  average 

"Category  V  examinees  are  excluded  by  law  from  military  service 


2.  How  useful  are  the  data  collected  by  the  services  before  and  during 
classroom  training  for  selecting  individuals  for  high  technology  roles 
and  for  evaluating  the  effectiveness  of  this  training? 

We  examined  the  measures  of  recruit  performance  collected  during 
training  and  assessed  their  utility  for  evaluating  training  effectiveness, 
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as  well  as  for  providing  information  on  the  validity  of  procedures  used 
to  assign  recruits  to  training. 

3.  How  well  do  the  services’  selection  criteria  and  training  evaluation 
measures  predict  success  in  high  technology  roles? 

We  examined  the  procedures  used  by  each  of  the  services  to  assess  the 
impact  of  training  on  actual  job  performance.  We  also  related  these  pro¬ 
cedures  to  the  asvab  scores  used  to  select  trainees  and  to  classroom  mea¬ 
sures  of  training  success,  in  order  to  estimate  the  predictive  validity  of 
these  measures.- 

In  view  of  the  demographic  shifts  projected  for  the  labor  force  over  the 
next  decade,  we  provided  separate  answers  to  each  of  these  questions, 
wherever  possible  and  appropriate,  for  women  and  minorities. 

We  defined  high  technology  roles  as  those  occupational  specialties  for 
which  the  services  require  a  qualifying  score  in  electronics  substantially 
above  the  mean.  For  our  review,  we  selected  a  sample  of  13  such 
courses — five  from  the  Army  and  four  each  from  the  Navy  and  the  Air 
Force — from  which  we  collected  data  on  individual  student  perform¬ 
ance.  Each  of  these  courses  is  intended  to  provide  a  recruit  the  neces¬ 
sary  introductory  training  to  qualify  as  an  apprentice  in  his  specialty. 

In  the  course  of  our  review,  we  interviewed  officials  responsible  for 
training  evaluation  in  the  Office  of  the  Secretary  of  Defense  and  within 
each  of  the  three  services.  We  visited  four  service  training  centers  and 
the  facilities  maintained  by  each  of  the  services  for  research  into 
training  and  other  personnel  issues,  as  well  as  the  Training  Performance 
Data  Center  in  the  Office  of  the  Secretary  of  Defense.  Our  final  data 
base  was  compiled  from  information  received  from  all  of  these  sources, 
but  our  primary  source  for  asvab  and  demographic  data  was  the  Defense 
Manpower  Data  Center.  We  also  received  information  from  the  Center 
for  Naval  Analyses  on  technical  acljustments  to  asvab  validity  estimates, 
and  on  the  asvab  norm  group.  This  study  was  conducted  in  accordance 
with  generally  accepted  government  auditing  standards. 
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Strengths  and 
Limitations  of  Our 
Study 


Our  review  of  the  quality  trends  among  the  2.3  million  recruits  who 
entered  military  service  from  1981  to  1989  is  more  finely  grained  than 
the  traditional  counts  of  recruits  in  each  of  four  mental  categories  rou¬ 
tinely  reported  to  the  Congress.  We  report  the  differences  among  racial 
groupings  and  between  male  and  female  recruits,  and  we  examine  dif¬ 
ferential  trends  among  the  various  areas  measured  by  asvab.  We 
assumed  the  reliability  and  validity  of  the  widely  researched  asvab  and 
its  subtests  and  made  no  independent  review  of  these  factors.  However, 
we  did  develop  an  independent  scoring  procedure  for  asvab  that  sug¬ 
gests  an  alternative,  and  apparently  more  valid,  approach  to  assigning 
recruits  to  occupational  specialties. 

The  intent  of  our  review  of  classroom  grades  and  other  evaluation  mea¬ 
sures  was  to  identify  the  major  sources  of  training  evaluation  informa¬ 
tion  now  in  place  in  the  services,  and  to  make  use  of  the  objective  data 
we  collected  to  address  some  concerns  about  recent  trends  in  recruit 
quality  and  the  future  composition  of  the  recruit  pool. 

Two  important  considerations  about  our  sample  of  students  limit  any 
attempt  to  generalize  our  findings.  First,  we  deliberately  chose  occupa¬ 
tional  specialties  for  which  the  services  required  above  average  mental 
qualifications.  While  the  types  of  classroom  measures  employed  in  these 
courses  would  most  likely  be  found  in  other  courses  with  similar 
requirements,  we  can  say  little  about  the  evaluation  procedures  for  less 
demanding  specialties.  Second,  in  part  because  of  the  nature  of  the  spe¬ 
cialties  we  chose,  our  sample  contained  relatively  few  members  of 
minority  groups  and  very  few  women.  This  fact  limited  the  power  of  our 
statistical  analysis  of  these  subgroups,  and  allowed  only  first-level  com¬ 
parisons  (that  is,  white  versus  nonwhite;  male  versus  female).  Neverthe¬ 
less,  even  at  this  level,  we  believe  we  have  identified  some  important 
differences  and  gaps  in  the  available  data  for  determining  the  success  of 
training  outcomes.  These  differences  and  gaps,  together  with  other  find¬ 
ings  from  our  analyses,  strongly  suggest  the  need  for  further,  more 
targeted  evaluation  of  its  training  efforts  by  the  military. 
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In  1980,  there  were  2.4  million  more  American  youths  aged  18-21  than 
there  are  today.  This  age  group,  which  now  numbers  15  million,  wilt 
diminish  to  13.5  million  by  the  mid-1990’s.  ThisT5-year  22-percent 
decline  in  the  population  from  which  the  all-volunteer  force  draws  its 
new  personnel  must  be  a  matter  of  concern  to  military  recruiters.  The 
concern  is  exacerbated  when  we  consider  the  technological  aptitude  of 
the  potential  recruit  pool:  it  appears  that  the  graduates  of  our  public 
schools  are  becoming  less  technologically  literate  when  compared  to 
their  peers  in  other.developed  nations — and  this  decline  is  occurring  just 
as  our  weapons  systems  are  reaching  new  heights  of  technological 
sophistication. 

However,  by  the  standards  set  by  dod,  the  quality  of  military  recruits  in 
the  first  half  of  the  1980’s  did  not  decline  in  proportion  to  the  dwindling 
numbers  in  the  recruit  pool.  As  we  have  noted  in  the  previous  chapter, 
dod  reported  “the  most  remarkable  turnaround  in  peacetime  history” 
•between  1980  and  1986,  with  dramatic  increases  in  the  proportion  of 
recruits  who  had  graduated  from  high  school  and  who  scored  in  the  top 
three  afqt  categories. 

In  this  chapter,  we  will  address  our  first  evaluation  question:  How  has 
the  aptitude  of  recruits  for  technologically  sophisticated  specialties 
changed  since  1 980?  Our  purpose  is  threefold:  (1)  to  determine  whether 
the  quality  gains  as  defined  and  reported  by  the  services  in  the  first  half 
of  the  1980’s  are  being  maintained;  (2)  to  expand  the  definition  of 
quality  to  include  other  measures  beyond  those  traditionally  reported 
(that  is,  high  school  graduation  and  service-defined  mental  category); 
and  (3)  to  examine  in  greater  detail  two  occupational  specialties  that,  by 
service  definition,  require  higher  entry  levels  of  technological  sophisti¬ 
cation.  We  will  report  the  trends  we  found  in  the  scores  achieved  by 
recruits  from  fiscal  year  1981  through  fiscal  year  1989  on  some  of  the 
various  subtests  and  composites  of  the  Armed  Services  Vocational  Apti¬ 
tude  Battery  (asvab),  the  instrument  used  by  all  services  to  both  qualify 
applicants  for  entry  and  classify  recruits  into  occupational  specialties. 
We  will  examine  in  detail  those  scores  that  are  used  by  the  services  to 
qualify  recruits  for  more  technologically  demanding  specialties. 


Armed  Services 
Vocational  Aptitude 
Battery  (ASVAB) 


asvab  is  composed  of  ten  subtests  measuring  abilities  considered  impor¬ 
tant  for  military  service.  Scores  from  asvab  subtests  are  combined  to 
form  composite  scores  thought  to  be  related  to  general  types  of  occupa¬ 
tional  specialties  within  the  armed  forces.  While  different  services  use 
different  methods  to  combine  subtest  scores  into  composites,  all  services 
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use  the  same  component  subtests  for  two  composite  scores,  the  Armed 
'Forces  Qualification  Test  (afqt)  and  the  Electronics  Composite.  We 
examined  these  two  in  detail  to  determine  how  they  have  changed 
during  the  1980’s. 

Armed  Forces 

Qualification  Test  (AFQT) 

An  afqt  score  is  currently  derived  from  a  recruit’s  scores  on  four  asvab 
subtests:  Word  Knowledge,  Paragraph  Comprehension,  Arithmetic  Rea¬ 
soning,  and  Mathematics  Knowledge.'  afqt  scores  are  the  primary 
mental  criterion  for  entry  into  the  armed  services.  Figure  2.1  displays 
the  mean  composite  afqt  scores  for  men  and  women  from  1981  through 
1989.  Actual  mean  scores  for  this  period  may  be  found  in  appendix  I. 

CJni irn  O  i  •  M QQn  A  CAT  Cr  Aroc  h w 

riyUic  £•  1.  mcdfl  MrU  1  OuOfco,  Uy 

Gender:  1981-89 

215 

— —  MALE 
....  FEMALE 

Nolo:  AFQT  scores  were  computed  as  the  sum  ol  standard  scores  on  Arithmetic  Reasoning  and 
Mathematics  Knowledge,  plus  the  Veibal  standard  score  times  two.  This  is  the  lormula  used  by  DOD 
as  ot  January  1, 1989. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 


‘Before  1989,  AFQT  scores  were  computed  differently.  In  order  to  mountain  comparability,  we  com¬ 
puted  AFQT  scores  of  all  recruits  using  the  1989  definition  and  the  standard  subtest  scores  provided 
by  the  Defense  Manpower  Data  Center. 
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Overall  afqt  scores  improved  approximately  eight  points  between  1981 
and  1989.  This  improvement  occurred  among  both  male  and  female 
recruits.  However,  despite  fluctuations  over  the  years,  the  scores  of 
male  recruits  began  and  ended  the  decade  slightly  higher  than  female 
scores.  Male  scores  continued  to  increase  each  year  until  1988,  although 
their  rate  of  increase  was  greatest  in  the  first  four  years.  Female  scores 
improved  dramatically  from  1981  to  1983  but  then  flattened  out,  so  that 
by  the  end  of  the  decade  they  were  lower  than  in  any  year  since  1985. 

afqt  scores  differed  more  substantially  across  racial/ethnic  groupings 
than  between  genders.  (See  figure  2.2.)  White  recruits  began  the  decade 
with  scores  approximately  21  points  higher  than  minority  recruits.  By 
1989,  this  difference  had  shrunk  to  15  points.  The  bulk  of  the  relative 
gain  by  minority  recruits,  however,  had  occurred  by  1985,  and  any  nar¬ 
rowing  of  this  gap  since  then  has  been  slight. 


Figure  2.2:  Mean  AFQT  Scores,  by  Race/ 
Ethnicity:  1981-89 


- -  WHITE 

....  black 

mm  HISPANIC 
■  ■■■  OTHER 

Note:  AFQT  scores  were  computed  as  Iho  sum  o(  slandard  scores  on  Arithmetic  Reasoning  and 
Mathematics  Knowledge,  plus  the  Verbal  standard  score  times  two.  This  is  the  formula  used  by  DOD 
as  of  Januaty  1,  1969. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 
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Mean  afqt  scores  in  all  services  were  significantly  higher  in  1989  than 
in  1981.  (See  figure  2.3.)  Army  recruits  showed  the  greatest  gain. 
Average  Army  scores  were  substantially  lower  than  those  of  other  ser¬ 
vices  at  the  beginning  of  the  decade,  but  by  1986  they  had  increased  to 
approximately  the  same  level  as  scores  achieved  by  Navy  and  Marine 
recruits.  Navy  scores  peaked  in  1983  and  have  declined  somewhat 
slowly  and  erratically  since  then  to  a  level  less  than  2  points  higher  than 
they  were  at  the  beginning  of  the  decade.  Air  Force  afqt  scores  have 
consistently  averaged  higher  than  the  other  services’  and  have  not  dis¬ 
played  their  tendency  to  plateau  at  mid-decade  levels. 


Figure  2.3:  Mean  AFQT  Scores,  by 
Service:  1981-89 


1961  1982  1983  1984  1985  1986  1987  1988  1989 


— —  ARMY 
....  NAVY 
mm  AIRFORCE 
■  ■■■  MARINE  CORPS 

Note:  AFQT  scores  were  computed  as  the  sum  of  standard  scores  on  Arithmetic  Reasoning  and 
Mathematics  Knowledge,  plus  the  Verbal  standard  score  times  two.  This  is  the  lormula  used  by  DOD 
as  of  January  1.  1989. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 

Figure  2.4  displays  the  service-wide  mean  scores  on  each  of  the  four 
component  subtests  that  make  up  afqt.  For  two  of  the  subtests,  Word 
Knowledge  and  Paragraph  Comprehension,  the  pattern  is  quite  similar, 
with  the  sharpest  gains  occurring  by  1985,  and  little  change  thereafter. 
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Scores  in  Mathematics  Knowledge  and  Arithmetic  Reasoning  increased 
substantially  between  1981  and  1984.  Arithmetic  Reasoning  scores 
declined  after  that  point,  but  scores  in  Mathematics  Knowledge  have 
continued  to  rise  and  were  the  only  subtest  scores  to  increase  from  fiscal 
year  1988  to  fiscal  year  1989. 


Figure  2.4:  Mean  AFQT  Subtest  Scores, 
1981-89 


Electronics  Composite 
Scores 


—  ARITH.  REASONING 
....  WORD  KNOWLEDGE 
mmmm  PARA.  COMPREHENSION 
■  ■■■  MATH  KNOWLEDGE 

Source:  Data  ate  horn  the  Defense  Manpower  Data  Center. 


The  Electronics  Composite  score  is  defined  by  each  service  as  the  sum  of 
four  subtest  scores:  Arithmetic  Reasoning,  Mathematics  Knowledge, 
Electronics  Information,  and  General  Science.  Figure  2.5  displays  the 
mean  Electronics  Composite  score  for  men  and  women  from  1981 
through  1989.  Figure  2.6  presents  the  same  information  by  racial/ethnic 
grouping. 
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Figure  2.5:  Mean  Electronics  Composite 
Scores,  by  Gender:  1981-89 


Chapter  2 

The  Quality  of  Military  Recruits:  1981-89 


215 


MALE 

....  FEMALE 

Note:  Electronics  Composite  scores  were  computed  as  the  sum  ol  standard  scores  on  Arithmetic 
Reasoning,  Mathematics  Knowledge,  Electronics  Information,  and  General  Science. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 
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Figure  2.6:  Mean  Electronics  Composite 
Scores,  by  Race/Ethnicity:  1981-89 
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Nole:  Eledronics  Composite  scores  were  computed  as  the  sum  oi  standard  scores  on  Arithmetic 
Reasoning,  Mathematics  Knowledge,  Electronics  Information,  and  General  Science. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 


Electronics  Composite  mean  scores  rose  approximately  3-1/2  points 
between  1981  and  1989.  They  peaked  in  1984  and  experienced  a  gradual 
decline  thereafter.  Female  recruits  scored  approximately  1 1  points 
lower  than  male  recruits  during  this  period. 

Because  of  the  overlap  between  the  Electronics  Composite  and  afqt,  the 
racial  differences  are  similar.  In  1981,  white  recruits  scored  approxi¬ 
mately  24  points  higher  than  minorities  on  this  composite.  By  1989,  the 
gap  had  narrowed  to  approximately  19  points,  but  most  of  these  gains 
by  minorities  were  attained  in  the  earlier  part  of  the  decade.  By  1989, 
the  scores  of  all  racial  groups  were  declining. 

The  interservice  pattern  of  Electronics  Composite  scores  is  again  similar 
to  the  afqt  patterns  discussed  previously.  (See  figure  2.7.)  Army  scores 
progressed  from  an  average  of  ten  points  lower  than  the  next  closest 
service  in  1981  to  being  essentially  the  same  as  Navy  and  Marine  scores 
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by  1986.  Mean  scores  for  these  three  services  changed  very  little  from 
1985  to  1988,  but  Army  and  Navy  scores  declined  significantly  in  1989. 
Air  Force  scores  have  remained  Higher  than  other  services’  but  have 
fluctuated  irregularly  since  1984. 


Figure  2.7:  Mean  Electronics  Composite 
Scores,  by  Service:  1981-89 


1981  1982  1983  1984  1985  1986  1987  1988  1989 


— —  ARMY 
----  NAVY 
mmm  AIRFORCE 
■  ■■■  MARINE  CORPS 

Note:  Electronics  Composite  scores  were  computed  as  the  sum  ol  standard  scores  on  Arithmetic 
Reasoning,  Mathematics  Knowledge,  Electronics  Information,  and  General  Science. 

Source:  Data  are  from  the  Defense  Manpower  Data  Center. 

The  trends  during  this  period  were  not  the  same  for  all  the  subtests  that 
comprise  the  Electronics  Composite  score.  (See  figure  2.8.)  Scores  in 
General  Science  and  Mathematics  Knowledge  increased  steadily  over 
these  years.  Scores  in  Arithmetic  Reasoning  increased  from  1981  to 
1983  but  by  1986  had  declined  again  and  have  since  remained  relatively 
constant.  In  1981,  recruits  scored  higher  in  Electronics  Information  than 
in  the  other  component  subtests,  but  by  1988  the  scores  were  lower  than 
for  other  subtests  and  lower  even  than  they  had  been  at  the  beginning  of 
the  decade.  In  1989,  they  declined  further. 
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Figure  2.8:  Mean  Electronics  Composite 
Subtest  Scores,  1981-89 


55  Standard  Scores 
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Source:  Data  are  from  the  Defense  Manpower  Data  Center. 


Number  of  Recruits 
Qualified  for  High 
Technology  Specialties 


An  alternative  method  for  examining  trends  in  recruit  qualifications  is 
to  enumerate  the  number  of  recruits  whose  asvab  scores  meet,  the  min¬ 
imum  standards  required  for  entry  into  certain  occupational  specialties. 
Each  service  defines  “cutting  scores”  for  classifying  recruits — that  is,  a 
minimum  score  on  one  or  more  asvab  composites  is  required  for  entry 
into  training  for  each  specialty.2  This  score  can  be  adjusted  to  control 
flow  into  specialties  as  needed.  We  chose  two  of  the  more  demanding 
specialties,  both  of  them  in  the  Air  Force,  and  computed  the  number  of 
recruits  into  each  service  from  1 98 1  to  1 989  whose  asvab  scores  would 
have  qualified  them  for  technical  training  in  these  specialties.  We  chose 
these  specialties  as  examples  of  high  technology  military  occupations 
because  they  share  cutting  scores  with  a  number  of  other  technologi¬ 
cally  oriented  specialties.  Our  purpose  was  not  to  imply  either  a  surplus 
or  deficit  of  requisite  manpower. 


2Otlicr  qualifications  may  also  apply — for  example,  possession  of  a  valid  driver’s  license,  special 
pliyslcal  qualifications,  or  the  ability  to  obtain  appropriate  levels  of  security  clearance. 
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Figure  2.9:  Number  of  Recruits  Qualifying 
for  Training  as  Control  and  Warning 
Radar  Specialists,  1981-89 


Figure  2.9  depicts  the  number  of  recruits  during  the  period  in  question 
who  would  have  qualified  for  training  as  control  and  warning  radar  spe¬ 
cialists  in  the  Air  Force  on  the  basis  of  their  asvab  scores.3  In  1981, 
approximately  38,000  recruits  qualified  for  this  specialty.  By  1986,  the 
number  of  recruits  qualifying  had  risen  to  more  than  69,000,  but  since 
then  the  number  has  declined  to  just  under  58,000.  In  1981, 87  percent 
of  the  recruits  qualifying  for  training  as  control  and  warning  radar  spe¬ 
cialists  were  white  males,  although  only  about  two  thirds  of  1981 
recruits  were  white  males.  These  proportions  had  not  changed  substan¬ 
tially  by  1989,  when  white  males  comprised  84  percent  of  qualified 
recruits  but  only  61  percent  of  the  general  recruit  population. 
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Source:  Data  are  liom  the  Defense  Manpower  Dala  Center. 

Because  the  total  manpower  quotas  for  the  services  have  varied  over 
this  period,  we  also  computed  the  percent  of  all  recruits  within  the 


3  We  used  lire  cutting  score  that  was  current  for  Air  Force  recruits  in  May  1989— an  Electronics  Com¬ 
posite  score  of  230. 
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gender  and  racial/ethnic  groups  who  qualified  for  this  specialty.  The 
results  are  displayed  in  figure  2.10. 


Figure  2.10:  Percent  of  Recruits 
Qualifying  for  Training  as  Control  and 
Warning  Radar  Specialists,  1981-89 


— -  WHITE  MALE 
....  NONWHITE  MALE 
Mi  WHITE  FEMALE 
■  ■■■  NONWHITE  FEMALE 

Source;  Data  are  from  the  Defense  Manpower  Data  Center. 


While  nearly  a  third  of  white  males  who  entered  the  services  during  this 
period  qualified  on  the  basis  of  their  Electronics  Composite  scores  for 
this  occupational  specialty,  fewer  than  15  percent  of  white  females 
qualified.  Fewer  than  10  percent  of  minority  males  and  approximately  3 
percent  of  minority  females  qualified. 

The  demographic  differences  are  even  more  sharply  defined  when  the 
occupational  specialty  of  Systems  Repair  Technician  is  examined.  (See 
figures  2.11  and  2.12.) 
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Figure  2.11:  Number  of  Recruits 
Qualifying  for  Training  as  Systems 
Repair  Technicians,  1981-89 
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Figure  2.12:  Percent  of  Recruits 
Qualifying  for  Training  .as  Systems 
Repair  Technicians,  1981-89 


16.5  Parcant 


—  WHITE  MALE 
....  OTHER 

Source:  Data  ate  (rom  the  Defense  Manpower  Data  Center. 


Summary  and 
Conclusions 


In  1981, 16,563  recruits  met  the  demanding  qualifications  for  training  in 
this  field.4  The  number  of  qualified  recruits  increased  sharply  by  1983, 
but  by  the  end  of  the  decade  it  had  dropped  to  within  700  of  its  1981 
level.  The  vast  majority  of  these  were  white  males,  of  whom  approxi¬ 
mately  1 1  percent  qualified.  Fewer  than  2  percent  of  our  other  demo¬ 
graphic  groups  met  the  qualifications. 


As  we  approach  the  twenty-first  century,  the  sophistication  of  our 
weapons  systems  can  be  expected  to  impose  greater  demands  on  the 
technological  competence  of  the  individual  members  of  the  armed 
forces.  In  addition,  the  youth  pool  from  which  the  services  will  draw 
their  recruits  will  become  increasingly  female  and  minority.  And 
although  we  cannot  foresee  how  reduced  political  tensions  may  ease  the 
demands  on  this  pool,  our  examination  of  recruit  quality  trends  during 
the  1980’s  is  not  reassuring  concerning  the  military’s  ability  to  meet 
these  challenges. 


4 Tills  specialty  requires  an  ASVAB  Electronics  Composite  score  of  235  and  a  mechanical  score  of  247, 
requirements  that  rank  it  among  the  most  challenging  fields  in  all  of  the  servicer. 
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afqt  scores  and,  to  a  lesser  extent,  Electronics  Composite  scores  are 
higher  now  than  they  were  in  1981,  yet  both  have  begun  to  decline.  The 
Electronics  Information  subtest  scores  are  lower  than  they  were  in  1981, 
and  General  Science  scores  have  dropped  to  near  their  1981.  level.  Thus, 
fewer  recruits  are  qualifying  for  the  more  demanding  technical  occupa¬ 
tional  specialties. 

Women  and  minorities  have  traditionally  scored  lower  in  these  areas. 
While  the  gap  between  white  males  and  other  recruits  narrowed  some¬ 
what  in  the  early  1980’s,  since  mid-decade  the  race  and  gender  differ¬ 
ences  have  remained  fairly  constant.  As  we  discussed  in  the  previous 
chapter,  women  and  minorities  will  form  the  bulk  of  the  new-entry  labor 
pool  by  the  year  2000,  and  therefore  providing  well-trained  personnel 
for  a  technologically  sophisticated  military  can  be  expected  to  become 
increasingly  difficult.  The  burden  on  training  will  increase,  and  with  it 
will  come  the  need  to  monitor  the  effectiveness  of  this  training  as  recruit 
demographics  shift. 

In  the  following  chapters,  we  will  address  the  services’  current  ability  to 
measure  the  effectiveness  of  their  training  in  technologically  demanding 
areas.  We  will  also  examine  the  differences  among  gender  and  racial/ 
ethnic  groupings,  and  the  ability  of  the  afqt  and  Electronics  Composite 
scores  to  predict  success  in  technical  military  specialties. 
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In  this  chapter,  we  address  our  second  evaluation  question:  How  useful 
are  the  data  collected  by  the  services  before  and  during  classroom 
training  for  selecting  individuals  for  high  technology  roles  and  for  eval¬ 
uating  the  effectiveness  of  this  training?  Although  we  reviewed  a  broad 
spectrum  of  evaluation-related  materials  and  activities  performed  by 
the  services  at  the  classroom  level,  we  concentrated  on  the  course 
grades  assigned  at  the  end  of  training  and,  in  some  cases,  at  interme¬ 
diate  stages  during  the  training  process.  Our  intention  was  to  define  the 
extent  to  which  appropriate  data  were  available  to  the  services  and  to 
external  reviewers  from  which  some  judgments  could  be  made  about 
training  effectiveness.  We  did  not  attempt  to  perform  an  evaluation  of 
individual  curricula,  training  sites,  or  instructors. 

Our  primary  criterion  for  selecting  courses  for 'review  was  that  the  qual¬ 
ifying  score  for  course  entry,  as  established  by  the  service,  was  rela¬ 
tively  high.  In  addition,  we  considered  annual  trainee  throughput  and 
the  recent  stability  of  the  course  curriculum.  Nearly  all  the  courses 
which  met  our  criteria  were  in  the  electronics  area,  and  most  involved 
the  use,  maintenance,  and  repair  of  electronic  equipment,  particularly 
radar  or  sonar.  We  collected  the  course  grades  associated  with  advanced 
individual  training  for  13  occupational  specialties,  four  each  in  the  Navy 
and  Air  Force,  and  five  in  the  Army.  Some  of  the  data  were  collected  at 
the  training  site,  and  some  from  centrally  computerized  records. 

Because  of  large  differences  between  the  services  in  annual  throughput 
of  trainees  in  these  courses,  the  size  of  our  sample  varied  widely  across 
services.  This  variation  was  increased  by  problems  we  encountered  con¬ 
cerning  the  usefulness  of  certain  data  provided  by  the  Army  (see  the 
following  section),  as  well  as  by  our  decision  to  supplement  our  already 
sizable  Navy  data  base  with  relevant  data  previously  collected  by  the 
Navy  for  research  purposes.  Our  final  sample  consisted  of  more  than 
6,000  sailors,  nearly  1,000  Air  Force  personnel,  and  fewer  than  300 
soldiers.  In  this  chapter,  we  present  the  results  of  our  analysis  sepa¬ 
rately  for  each  service. 

We  examined  the  course  data  for  their  apparent  reliability — that  is,  for 
their  apparent  ability  to  discriminate  meaningfully  between  perform¬ 
ances  of  trainees— as  well  as  for  differences  in  training  outcomes  among 
the  demographic  groupings  discussed  in  the  previous  chapter.  We  also 
examined  the  relationship  between  training  outcomes  and  individual 
abilities,  as  measured  by  asvab,  in  order  to  estimate  the  power  of  the 
selection  criteria  to  predict  performance  in  training. 
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Army 


The  Army  specialties  for  which  we  collected  data  are  listed  in  table  3.1. 


Table  3.1:  Army  Occupational  Specialties 
Reviewed 


Specialty 

Title 

Location 

Electronics 

Composite 

qualifying 

score* 

24J 

Hawk  pulse  radar  repairer 

Redstone  Arsenal,  Ala. 

217 

27N 

Forward  area  alerting 
radar  repairer 

Redstone  Arsenal,  Ala. 

217 

29V 

Strategic  microwave 
systems  repairer 

Fort  Gordon,  Ga. 

217 

36L 

Transportable  automatic 
systems  operator 

Fort  Gordon.  Ga. 

217 

39B 

Automatic  test  equipment 
operator 

Fort  Gordon,  Ga. 

217 

*Sum  of  subtest  standard  scores 


We  found  that  the  course  grades  for  these  five  specialties  were  not 
equally  reliable  indicators  of  performance  during  training.  Whereas  for 
the  two  classes  at  Redstone  Arsenal  final  grades  were  a  simple  arith¬ 
metic  average  of  intermediate  measures  of  performance,  at  Fort  Gordon 
we  were  unable  to  find  a  consistent  relationship  between  individual 
milestone  measures  and  final  grades,  nor  were  we  able  to  locate  anyone 
at  Fort  Gordon  who  could  suggest  one.  We  concluded  that  the  grades 
recorded  for  two  of  these  courses  (36L  and  39B)  could  not  be  used  to 
discriminate  reliably  between  the  performances  of  individual  trainees. 
We  found  inconsistencies  in  scoring  procedures  between  different 
classes  and  even  within  the  same  class.  Finally,  we  discovered  that  the 
Fort  Gordon  grades  (unlike  those  at  Redstone)  were  based  partially  on 
measures  of  physical  conditioning  that  appeared  to  be  unrelated  to  job 
performance. 

For  a  third  training  course  at  Ford  Gordon  (29V),  however,  we  were  able 
to  generate  what  we  judged  to  be  reasonable  measures  of  performance 
for  some  classes.  For  these  classes,  we  developed  an  algorithm  to  pro¬ 
duce  scores  based  only  on  those  nonconstant  measures  that  were  related 
to  general  or  applied  electronics  training.1 


1  External  corroboration  of  the  preferability  of  this  improvised  scoring  procedure  was  provided  by 
our  later  analysis  of  the  relationship  between  grades  and  ASVAB.  The  correlation  between  original 
20V  grades  and  the  Electronics  Composite  was  negative  and  nonsignificant.  The  revised  grades  were 
positively  (.60)  and  significantly  correlated  (p  <  .01)  with  this  ASVAB  score. 
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Our  final  sample  was  therefore  composed  of  U.S.  Army  trainees  from 
those  24J  and  27N  classes  conducted  in  fiscal  years  1985  through  1988 
whose  records  were  available  at  the  time  of  our  visit,  and  approximately 
one  third  of  the  29V  trainees  from  the  same  period.  Table  3.2  presents 
the  mean  scores  of  this  sample  on  afqt,  the  Electronics  Composite  of 
asvab,  and  course  grades.2 


Table  3.2:  Mean  Scores  on  Predictor  and 
Criterion  Variables,  Army 

Category 

AFQT 

Number  Mean* 

Electronics 
Composite 
Number  Mean* 

Grade 

Number  Mean 

Male 

280  232.15 

280  238.46 

232 

8923 

Female 

23  232.87 

23  230.13 

23 

8608 

White 

255  234.00 

255  24000 

160 

90.19 

Nonwhite 

48  222.67 

48  226.29 

95 

8686 

Total 

303  232.20 

303  237.83 

255 

88.95 

aSum  of  subtest  standard  scores 

Male  trainees  in  these  courses  scored  significantly  higher  than  did 

females,  and  white  trainees  performed  better  than  minority  students. 
These  performance  differences  correspond  to  group-level  differences  in 
both  afqt  and  Electronics  Composite  scores  for  racial/ethnic  groupings. 

The  group  means  presented  in  table  3.2  also  suggest  that  afqt  and  Elec¬ 
tronics  Composite  scores  do  not  equally  predict  success  in  training,  at 
least  for  females.  While  female  trainees  entered  training  with  Elec¬ 
tronics  Composite  scores  significantly  lower  than  those  of  males,  the 
afqt  scores  of  female  and  male  trainees  were  equivalent.  In  other  words, 
it  would  appear  that  Electronics  Composite  scores  are  a  better  indication 
of  future  performance  in  these  occupational  specialties  than  are  afqt 
scores.  This  is  consistent  with  asvais’s  role  in  the  military  accession  pro¬ 
cess:  potential  recruits  are  admitted  to  service  on  the  basis  of  afqt 
scores,  and  then  are  assigned  to  occupational  specialties  for  which  they 
qualify  on  the  basis  of  their  scores  on  other  asvab  composites. 

We  tested  this  hypothesis  more  directly  by  examining  the  correlations 
between  course  grades  and  three  asvab  scores:  afqt,  Electronics  Com¬ 
posite,  and  a  ‘‘factor  score.”  This  last  measure  is  the  weighted  sum  of  all 
ten  asvab  subtests.  We  derived  this  hist  score  by  principal  component 
analysis  of  asvab  subtest  scores.  The  results  of  our  correlation  analysis 
are  displayed  in  table  3.3. 


2Sce  appendix  II  for  similar  statistics  on  the  course  level. 
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Table  3.3:  Intercorrelation  of  Study 
Variables,  Army" 


Category 

Electronics 
AFQT*1  Composite0 

Factor0 

Grade* 

Raw  Adjusted’ 

Total 

AFQT 

1.00 

0819 

0.849 

0.299 

0419 

Electronics  Composite 

303 

1.00 

0  899 

0439 

0599 

Factor 

303 

303 

100 

0429 

Grade 

189 

189 

189 

100 

Wale 

AFQT 

1.00 

0  839 

0.859 

0319 

0439 

Electronics  Composite 

280 

1.00 

0.899 

0.429 

0.589 

Factor 

280 

280 

1.00 

0419 

Grade 

171 

171 

171 

1.00 

Female 

AFQT 

1.00 

0.829 

0.879 

0.42 

0  539 

Electronics  Composite 

23 

1.00 

089 

0.35 

0.519 

Factor 

23 

23 

1.00 

0.35 

Grade 

18 

18 

18 

1.00 

White 

AFQT 

1.00 

0.809 

0.829 

0  249 

0  389 

Electronics  Composite 

255 

1.00 

0.879 

0409 

0.609 

Factor 

255 

255 

1.00 

0.409 

Grade 

154 

154 

154 

1.00 

Nonwhite 

AFQT 

1,00 

0.789 

0.859 

0.19 

022 

Electronics  Composite 

48 

1.00 

0.899 

0.30 

0.40 

Factor 

48 

48 

1.00 

0.26 

Grade 

35 

35 

35 

1.00 

“Correlation  cocllicienls  are  in  upper  diagonal  and  number  in  lower  diagonal. 


bAFQT  =  sum  of  subtesl  standard  scoies 

“Electronics  Composite  =  sum  ol  sublost  standard  scores  lor  Electronics  Composite 

“Factor  =  score  from  first  (actor  Irom  principal  component  analysis 

“Grade  =  final  course  grade 

’Adjusted  “  correlation  adjusted  lor  restriction  ol  tango 

9p  < ,05 


For  our  whole  Army  sample,  the  variation  within  Electronics  Composite 
scores  explains  approximately  18  percent  of  the  variation  within  course 


Page  35 


GAO/PEMD-91-t  Military  Technical-Training  Effectiveness  Is  Unknown 


Chapter  3 

Classroom  Measures  of 
Training  Effectiveness 


grades,  more  than  factor  scores  and  substantially  more  than  afqt.3  In 
most  cases,  Electronics  Composite  scores  are  somewhat  better  predictors 
of  grades  than  are  afqt  scores,  whether  a  simple  correlation  coefficient 
or  a  coefficient  adjusted  for  range  restriction  is  used  as  a  criterion.4  This 
is  not  true,  however,  for.female  soldiers,  for  whom  afqt  predicts  class¬ 
room  performance  better  than  the  Electronics  Composite  does.  In  most 
cases,  asvab  factor  scores  provide  stronger  predictions  than  either  afqt 
or  the  Electronics  Composite.  Our  ability  to  predict  course  grades  from 
any  of  the  three  asvab  scores  is  weakest  for  minority  soldiers  as  a  group. 

Our  analysis  of  nonwhite  and  female  soldiers  is  unfortunately  based  on 
a  relatively  small  sample.  Nevertheless,  it  suggests  that  afqt  or  some 
other  general  score  from  asvab  may  provide  a  better  predictor  of  success 
for  women  recruits  in  electronics-related  training  than  does  the  Elec¬ 
tronics  Composite  score.  It  also  indicates  that  we  need  better  predictors 
than  we  currently  have  for  minority  students. 


Navy 


We  examined  four  Navy  training  courses,  two  each  from  the  Antisub¬ 
marine  Warfare  School  in  San  Diego  and  the  Naval  Air  Station  in 
Millington,  Tennessee.  They  are  listed  in  table  3.4. 


3A  correlation  coefficient  is  the  square  root  of  common  variance.  In  this  case,  the  Electronics  Com¬ 
posite  score  from  ASVAB  shares  18  5  percent  (,432)  of  variance  with  grades,  or,  after  adjustment,  35 
percent  (,592). 

4Thc  adjustment  for  restriction  in  range  is  common  among  psychomctricians  and  appears  in  all  DOD 
reports  that  we  reviewed.  Since  correlations  are  simply  measures  of  the  extent  to  which  two  mea¬ 
sures  vary  in  common,  any  restriction  to  the  variation  of  one  of  the  measures  results  in  an  underesti¬ 
mate  of  their  common  variation.  This  restriction  occurs  when  tltc  sample  includes  only  one  end  of  a 
spectrum  of  scores,  as  is  the  case  for  any  measure  used  for  selection  purposes.  Our  sample  includes 
only  those  whose  AFQT  scores  were  sufficiently  high  to  permit  acceptance  into  military  sendee.  The 
adjusted  correlation  coefficient  represents  the  hypothetical  relationship  between  the  ASVAB  measure 
and  course  grades  if  this  range  restriction  did  not  exist  for  our  sample. 
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Table  3.4:  Occupational  Specialties 
Reviewed,  Navy 

Specialty 

Title 

Location 

Electronics 

Composite 

qualifying 

score* 

STG 

Sonar  technician, 
antisubmarine  warfare, 
surface 

San  Diego,  Calif. 

218 

STS 

Sonar  technician, 
antisubmarine  warfare, 
subsurface 

San  Diego,  Calif. 

218 

AQ 

Aviation  fire  control 
technician 

Millington,  Tenn. 

218 

AX 

Aviation  antisubmarine 
warfare  technician 

Millington,  Tenn, 

218 

3Sum  of  subtest  standard  scores 


We  were  able  to  achieve  a  much  larger  sample  size  (6,156)  for  these 
courses  than  was  the  case  for  our  Army  courses  (303)  because  of  their 
larger  annual  throughput,  and  because  the  Naval  Personnel  Research 
and  Development  Center  provided  us  with  relevant  data  that  they  had 
collected  on  STS  and  STG  specialties  for  fiscal  years  1986  and  1987. 
These  data  supplemented  the  fiscal  year  1988  and  fiscal  year  1989  data 
that  we  collected  at  the  San  Diego  base.  Millington  provided  us  with 
training  data  for  1987  and  1988.  Table  3.5  presents  the  mean  scores  on 
the  two  asvab  composites  and  course  grades  for  the  entire  Navy  sample. 
Statistics  on  individual  courses  are  presented  in  appendix  II. 


Table  3.5:  Mean  Scores  on  Predictor  and 
Criterion  Variables,  Navy 

Category 

AFQT 

Number  Mean* 

Electronics 
Composite 
Number  Mean* 

Grade 

Number  Mean 

Male 

6,080 

229.60 

6,080  235.33 

5,882 

89.11 

Female 

76 

235.59 

76  230.66 

71 

90.70 

White 

5,355 

230.49 

5,355  236  25 

5,179 

89.21 

Nonwhite 

801 

224.18 

801  228.75 

1,159 

89.58 

Total 

6,156 

229.67 

6,156  235.28 

6,443 

89.30 

*Sum  of  subtest  standard  scores 

Male  recruits  entered  training  with  significantly  lower  Al'QT  scores  and 
significantly  higher  Electronics  Composite  scores  than  those  for  females. 
Final  grades  for  males  were  slightly,  but  significantly,  lower  than  those 
for  their  female  classmates.  These  results  suggest  that,  at  least  for 
females,  a  substantial  advantage  in  apqt  can  overcome  a  disadvantage 
in  the  Electronics  Composite.  In  addition,  minority  students  began 
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-  *  •  training  with  substantially  lower  scores  than  nonminorities  on  both  afqt 
and  the  Electronics  Composite.  The  final  grades  of  the  two  groups  were 
not  significantly  different. 

The  results  of  our  correlation  analysis  appear  in  table  3.6.  They  suggest 
that  afqt  may  be  more  important  for  training  success  than  the  Elec¬ 
tronics  Composite.  For  most  Navy  groupings,  afqt  scores  are  better 
predictors  of  classroom  performance  than  are  Electronics  Composite 
scores.  When  adjusted,  they  explain  from  12  to  38  percent  of  the  varia¬ 
tion  in  course  grades.  Once  again,  the  Electronics  Composite  is  the 
weakest  of  the  three  predictors  for  female  sailors,  and  the  more  general 
factor  score  is  the  strongest.  The  ability  of  any  of  the  three  asvab  scores 
to  predict  training  success  is  weakest  for  minorities. 
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Table  3.6:  Intercorrelation  of  Study 
Variables,  Navy* 


Air  Force 


Electronics 

Grade* 

Category 

AFQT* 

Composite' 

Factor' 

Raw 

Adjusted1 

Total 

AFQT 

1.00 

0.79# 

0809 

0309 

0469 

Electronics  Composite 

6,156 

1.00 

0  859 

0.279 

0.469 

Factor 

6.156 

6,156 

1.00 

0.289 

Grade 

5,939 

5,939 

5,939 

1.00 

Wale 

AFQT 

1.00 

0.799 

0.819 

0.309 

0469 

Electronic  Composite 

6,080 

1.00 

0.859 

0.279 

0.469 

Factor 

6,080 

6,080 

1.00 

0  279 

Grade 

5,868 

5.868 

5,868 

1.00 

Female 

AFQT 

1.00 

0.749 

0.819 

0.399 

0.629 

Electronics  Composite 

76 

1.00 

0.829 

0.329 

0.559 

Factor 

76 

76 

1.00 

0.399 

Grade 

71 

71 

71 

1.00 

White 

AFQT 

1.00 

0.799 

0.819 

0.309 

0479 

Electronics  Composite 

5,355 

1.00 

0.859 

0.299 

0509 

Factor 

5,355 

5,355 

1.00 

0.309 

Grade 

5,165 

5.165 

5,165 

1.00 

Nonwhite 

AFQT 

1.00 

0.749 

0.779 

0.229 

0.349 

Electronics  Composite- 

801 

1.00 

0.819 

0.149 

0  259 

Factor 

801 

801 

1.00 

0.119 

Grade 

774 

774 

774 

1.00 

Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


bAFQT  =  sum  of  sublest  standard  scores 

'Electronics  Composite  =  sum  of  subtest  standard  scores  for  Efectronics  Composite 

'Factor  =  score  from  first  factor  from  principal  component  analysis 

'Grade  =  final  course  grade 

'Adjusted  =  correlation  adjusted  for  restriction  or  range 

9p  <  ,05 


The  four  Air  Force  training  courses  we  reviewed  are  listed  in  table  3.7. 
Our  sample  size  from  these  courses  totaled  922.  Statistics  for  individual 
courses  are  provided  in  appendix  II.  (We  received.both  training  and 
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demographic  data  on  all  of  these  courses  from  the  Air  Force  Human 
Resources  Laboratory.) 


Table  3.7:  Occupational  Specialties 
Reviewed,  Air  Force 

Specialty 

Title 

Location 

Electronics 

Composite 

qualifying 

score* 

30332 

Aircraft  control  and 
warning  radar  specialist 

Keesier  AFB,  Miss. 

230 

30333 

Automatic  tracking  radar 
specialist 

Keesler  AFB,  Miss. 

225 

4S530A 

Photo-sensors 
maintenance  specialist, 
tactical  reconnaissance 

Lowry  AFB,  Colo. 

225 

sensors 

45530B 

Photo-sensors 
maintenance  specialist, 
reconnaissance  electro- 
optical  sensors 

Lowry  AFB.  Colo. 

225 

’Sum  ol  sublesl  standard  scores 


Trainees’  asvau  scores  and  course  grades  are  displayed  in  table  3.8.  As 
would  be  expected,  asvau  scores  for  Air  Force  students  are  significantly 
higher  than  those  for  the  other  services  we  reviewed.  In  addition,  we 
found  a  higher  proportion  of  female  trainees  in  the  Air  Force  courses 
than  in  the  Army  and  Navy  courses  we  reviewed. 


Table  3.8:  Mean  Scores  on  Predictor  and 

Criterion  Variables,  Air  Force 

AFQT 

Electronics 

Composite 

Grade 

Category 

Number  Mean* 

Number  Mean* 

Number  Mean 

Male 

824 

235.45 

824 

241.94 

854 

91.31 

Female 

98 

237.73 

98 

235.88 

100 

89.91 

White 

825 

236.22 

825 

241.95 

855 

91.21 

Nonwhite 

97 

231.19 

97 

235.73 

99 

90.76 

Total 

922 

235.69 

922 

241.30 

954 

91.16 

•Sum  ot  subtest  standard  scores 

Male  Air  Force  recruits  entered  training  with  substantially  higher  Elec¬ 
tronics  Composite  scores  and  slightly,  but  significantly,  lower  akqt 
scores  than  did  female  recruits.  Despite  the  slight  female  afqt  advan¬ 
tage,  male  recruits  ended  training  with  higher  course  grades  than  those 
earned  by  female  recruits.  In  addition,  although  white  students  began 
training  with  substantially  higher  asvau  scores,  their  final  grades  were 
not  significantly  different  from  those  of  their  nonwhitc  classmates. 
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As  table  3.9  demonstrates/the  correlations  between  asvab  and  Air  Force 
training  grades  followed  much  the  same  pattern  as  did  the  Navy’s.  When 
correlations  are  adjusted,  the  traditional  asvab  composite  scores  explain 
from  6  to  36  percent  of  classroom  performance.  Factor  scores  are  as 
good  as,  or  better  than,  composites  as  predictors.  For  female  students, 
afqt  scores  outpredict  Electronics  Composite  scores.  Once  again,  it  is 
most  difficult  to  predict  course  grades  for  minority  students,  although 
factor  scores  explained  10  percent  of  their  classroom  performance. 
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Table  Study 

Variables,  Air  Force*  Electronics  Grade* 


Category 

AFQT” 

Composite* 

Factor* 

Raw 

Adjusted’ 

Total 

AFQT 

1.00 

0.719 

0.759 

0.299 

0449 

Electronics  Composite 

922 

1.00 

0.849 

0.339 

0.549 

Factor 

922 

922 

1.00 

0.359 

Grade 

922 

922 

922 

1.00 

Male 

AFQT 

1.00 

0.749 

0.779 

0309 

0.449 

Electronics  Composite 

824 

1.00 

0  849 

0.339 

0.549 

Factor 

824 

824 

1.00 

0.349 

Grade 

824 

824 

824 

1.00 

Female 

AFQT 

1.00 

0689 

0.779 

0359 

0.549 

Electronics  Composite 

98 

1.00 

0.779 

0.269 

0  509 

Factor 

98 

98 

1.00 

0  289 

Grade 

98 

98 

98 

1.00 

White 

AFQT 

100 

0.729 

0.759 

0319 

0.479 

Electronics  Composite 

825 

1.00 

0.839 

0.359 

0.589 

Factor 

825 

825 

1.00 

0359 

Grade 

825 

825 

825 

1.00 

Nonwhite 

AFQT 

1.00 

0.659 

0.689 

0.19 

0.249 

Electronics  Composite 

97 

1.00 

0.829 

0.239 

0.339 

Factor 

97 

97 

1.00 

0319 

Grade 

97 

97 

97 

1.00 

’Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal 


bAFQT  «  sum  o(  subtest  standard  scores 

'Electronics  Composite  =  sum  ol  subtesl  standard  scores  for  Electronics  Composite 

‘’Factor  ■  score  from  first  factor  from  principal  component  analysis 

'Grade  -  final  course  grade 

’Adjusted  »  correlation  adjusted  for  restriction  of  range 

«p<.05 


Summary  and 
Conclusions 


Our  review  of  advanced  individual  training  courses — designed  to  pre¬ 
pare  recruits  in  three  services  to  serve  in  certain  “high  technology" 
roles — identified  some  problems  with  the  utility  of  data  maintained  by 
the  Army  on  classroom  performance  in  certain  specialties.  It  would  not 
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be  appropriate  to  make  interservice  comparisons  on  the  basis  of  this 
finding,  however,  since  much  of  the  Navy  training  information  and  all  of 
the  data  we  received  from  the  Air  Force  were  specially  prepared  for 
research  purposes.  We  cannot  therefore  make  firm  judgments  about  the 
immediate  availability  of  psychometrically  suitable  measures  from  these 
two  services. 

The  psychometric  deficiencies  we  found  at  Fort  Gordon  appeared  to 
result  from  a  number  of  different  factors,  including  questionable  data 
entry  procedures  and  software.  They  are  also  a  function  of  the  pass/fail 
nature  of  the  criteria  used  to  evaluate  student  progress.  We  cannot 
assess  the  extent  to  which  performance  on  individual  training  tasks  is 
susceptible  to  more  sophisticated  measures  than  “go/no-go,”  but  we 
would  suggest  that  subject  matter  experts  attempt  to  develop  more 
finely  tuned,  objective,  and  reliable  measures  of  performance. 

Our  review  also  raised  certain  questions  about  differential  success  in 
training  for  males  and  females,  and  for  whites  and  minorities,  and  about 
the  differential  predictive  validity  of  asvab  for  these  subgroups.  Our 
analysis  of  gender-  and  race-related  differences  in  mean  asvab  scores 
and  course  grades  in  the  Army  suggested  that  the  Electronics  Composite 
was  an  efficient  simple  predictor  of  training  success.  Women  and  minor¬ 
ities  entered  training  with  significantly  lower  Electronics  Composite 
scores  and  received  significantly  lower  course  grades. 

Our  findings  from  the  Navy  and  Air  Force  samples,  however,  suggest 
that  a  more  complex  relationship  exists  between  asvab  and  course 
grades.  For  these  services,  gender-  and  race-related  differences  in  course 
grades  were  small  or  nonexistent,  despite  significant  differences  in  Elec¬ 
tronics  Composite  scores.  The  Navy  and  Air  Force  samples  also  differed 
from  the  Army  sample  in  three  other  respects:  (1)  Electronics  course 
grade  differences,  though  significant,  were  much  smaller  in  the  Navy 
and  Air  Force  than  in  the  Army;  (2)  unlike  women  soldiers,  Navy  and 
Air  Force  women  had  significantly  higher  afqt  scores  than  their  male 
classmates;  and  (3)  the  afqt  disadvantage  for  minorities  in  the  Navy 
and  Air  Force  was  only  half  of  that  in  the  Army.  These  findings  suggest 
that  an  advantage  in  the  more  general  aptitude  measured  by  afqt  (or  by 
an  even  more  general  measure  such  as  a  factor  score)  can  compensate 
for  a  deficit  in  the  Electronics  Composite  when  the  deficit  is  not  too 
great.  In  other  words,  success  in  training  may  be  related  as  much  to  gen¬ 
eral  ability  as  to  performance  on  the  Electronics  Composite. 
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This  interpretation  is  consistent  with  the  results  of  our  correlation  anal¬ 
yses,  which  tested  the  relationship  between  asvab  scores  and  course 
grades  more  directly.  While  asvab’s  Electronics  Composite  score  demon¬ 
strated  a  moderate  ability  to  predict  success  in  training  for  white  male 
students,  it  was  less  successful  for  female  or  minority  students.  The 
factor  score  we  derived  from  asvab  was  in  most  cases  the  best  simple 
predictor  of  training  success  because  it  utilized  information  from  all  ten 
asvab  subtests,  and  not  simply  from  the  subset  used  for  afqt  or  the  Elec¬ 
tronics  Composite.  However,  all  three  asvab  measures  (afqt,  Electronics, 
and  factor  scores)  in  most  cases  proved  to  be  relatively  weak  predictors 
of  performance  in  training  for  minority  students. 

Correlations  do  not  imply  causality,  nor  does  the  lack  of  a  correlation 
for  a  subsample  indicate  the  location  of  a  problem.  From  our  analyses  it 
is  impossible  to  conclude  either  that  asvab  is  a  weaker  measure  of  ability 
for  some  groups,  or  that  some  factor  in  classroom  training  contributes 
differentially  to  the  success  of  different  groups.  Yet,  as  the  youth  pool 
shrinks  and  its  demographic  characteristics  shift,  the  military  will  find 
itself  turning  more  toward  minority  and  female  recruits.  These  groups, 
as  we  have  seen,  consistently  score  lower  in  the  measures  used  to  assign 
recruits  to  technical  training  and  in  our  largest  service  are  less  likely  to 
perform  well.  It  will  become  increasingly  incumbent  on  all  services  to 
optimize  selection  criteria  for  technical  advanced  individual  training  for 
women  and  minority  groups,  to  provide  compensatory  training  where 
needed,  and  to  assure  that  no  extraneous  factors  within  the  training 
environment  interfere  with  the  full  development  of  a  recruit’s  potential. 
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Whatever  criteria  may  exist  to  predict  or  to  assess  a  recruit’s  perform¬ 
ance  in  training,  the  ultimate  criterion  of  training  effectiveness  is  the 
recruit’s  performance  on  the  job.  Our  third  evaluation  question 
addresses  this  issue:  How  well  do  the  services’  selection  criteria  and 
training  evaluation  measures  predict  success  in  high  technology  roles? 

To  answer  this  question,  we  attempted  to  locate  individual  field-per¬ 
formance  data  routinely  collected  by  the  services  that  could  be  linked  to 
our  asvab  and  classroom  training  data  to  serve  as  reliable  and  valid 
indicators  of  training  effectiveness.  And,  although  we  were  made  aware 
of  numerous  post-training  evaluation  activities  performed  by  the  indi¬ 
vidual  services,  only  the  Army  could  provide  us  with  individual  per¬ 
formance  measures.  In  this  chapter,  we  will  examine  the  quantitative 
relationship  between  these  Army  data  and  the  other  information  we 
compiled.  We  will  also  discuss  other  evaluation  mechanisms  used  by  the 
services  and  suggest  a  potential  alternative  source  of  post-training  eval¬ 
uation  measures. 


Army 


Skill  Qualification  Test  By  Army  regulation,  a  soldier’s  occupational  specialty  performance  is 

tested  within  six  months  of  completion  of  training  and  every  year  there¬ 
after.  These  written  tests  are  prepared  by  the  sponsoring  training  site. 
They  are  administered  under  the  direction  of  the  Skill  Qualification  Test 
(SQT)  directorate  at  Fort  Eustis,  Virginia,  where  the  resulting  data  are 
stored. 

Fort  Eustis  provided  us  with  the  sqt  scores  of  all  soldiers  who  took  the 
SQT  from  1985  to  1988  in  the  occupational  specialties  we  had  chosen  for 
our  sample.  Summary  statistics  for  these  data  are  provided  in  appendix 
IV.  We  matched  these  scores,  where  possible,  with  asvab  scores  and 
classroom  grades  for  each  soldier  included  in  our  training  site  review.1 
Table  4.1  presents  the  scores  of  these  soldiers  summarized  by  demo¬ 
graphic  groups,  together  with  the  correlation  coefficient  estimating  the 
relationship  between  sqt  and  the  measures  we  examined  in  the  previous 
chapter. 


1  For  soldiers  with  multiple  SQT  scores  during  this  period,  we  used  only  the  first  score. 
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Table  4.1:  Correlation  of  SQT  and 
Predictor  Variables 


Correlation  with  SQT 


Electronics 


Category 

Mean 

Number  AFQT* 

Composite0 

Factor0  Grade' 

Wale 

82.12 

209 

Raw 

0.211 

0.28' 

036'  047' 

Adjusted0 

0.301 

0.41' 

Female 

77  52 

21 

Raw 

-0.07 

0.12 

-003  -0.52' 

Adjusted0 

-0.10 

0.19 

White 

81.86 

144 

Raw 

0.21* 

0.25' 

032'  0.44' 

Adjusted0 

0.33' 

0.40' 

Nonwhite 

81.45 

86 

Raw 

o 

o 

1 

0.07 

0.12  0.44' 

Adjusted0 

-022 

010 

Total 

81.70 

230 

Raw 

0.18' 

0.28' 

0.34'  043' 

Adjusted0 

0  26' 

0.41' 

aAFQT  =  sum  of  subtest  standard  scores 

'’Electronics  Composite  =  sum  of  subtest  standard  scores  for  Electronics  Composite 

'Factor  =  score  from  first  factor  from  principal  component  analysis 

'Grade  =  final  course  grade 

•Adjusted  =  adjusted  (or  restriction  of  range 

'p  <  .05 

For  the  total  universe  of  soldiers  the  best  simple  predictor  of  sqt  scores 
is  final  classroom  grades,  which  explains  18.5  percent  of  the  variation  in 
sqt’s.  The  afqt  and  Electronics  scores  from  asvais  scores  were  also  sig¬ 
nificantly  related  to  sqt’s  for  white  males  in  our  sample,  but  factor 
scores  consistently  outpredictcd  these  composites.  For  females  and  for 
nomvhite  soldiers,  however,  asvab  scores  were  not  positively  related  to 
future  performance  as  measured  by  sqt.  Most  surprisingly,  the  grades 
scored  by  female  students  at  the  training  site  were  inversely  correlated 
with  their  SQT  scores — that  is,  women  with  higher  grades  tended  to 
score  lower  on  sqt’s,  and  vice  versa. 

The  limited  size  of  our  sample,  especially  for  female  soldiers,  makes  it 
inappropriate  to  generalize  without  severe  caveats.  However,  our  anal¬ 
ysis  suggests  that  the  traditional  asvab  scores  may  not  be  the  best  pre¬ 
dictor  of  performance  for  the  nontraditional — that  is,  the  female  or 
minority — soldier.  This  finding  reinforces  the  concern  we  expressed  in 
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the  last  chapter,  that  better  predictors  of  success  for  these  groups 
should  be  found.  Any  interpretation  of  the  inverse  relationship  between 
grades  and  sqt’s  for  women  would  be  purely  speculative,  but  this 
anomaly  warrants  further  investigation. 


Other  Evaluation-Related  Each  Army  training  site  includes  an  evaluation  unit  that  performs  reg- 


Activities 


ular  process  evaluations.  These  include  classroom  observations  of 
instructors,  annual  meetings  to  review  curricula,  cyclical  outreach  pro¬ 


grams  to  contact  graduates  of  the  school  in  the  field  and  their  supervi¬ 


sors,  and  occasional  more  intensive  curriculum  reviews  called  training 


effectiveness  analyses. 


Classroom  observations  are  conducted  on  a  regular  basis  by  both  master 
trainers  and  the  training  site  internal  evaluation  unit.  They  are  per¬ 
formed  more  frequently  when  instructors  are  new  or  have  received  less- 
than-satisfactory  evaluations.  Most  of  the  observation  reports  that  we 
reviewed,  particularly  those  performed  by  the  internal  evaluation  unit, 
were  mainly  concerned  with  administrative  details.  The  most  frequent 
criticism  we  encountered  was  that  copies  of  the  lesson  plan  and  curric¬ 
ulum  materials  were  not  properly  arranged  and  situated  at  an  empty 
desk  in  the  rear  of  the  classroom  for  the  observer. 


Schoolhouse  external  evaluation  units  also  conduct  outreach  programs 
during  which  members  of  the  units  travel  to  Army  bases — where  a  large 
concentration  of  the  training-site  graduates  are  stationed — to  collect 
information  on  the  opinions  of  base  staff  about  training  quality.  These 
reviews  occur  approximately  every  two  or  three  years  for  the  courses 
we  reviewed,  but  they  are  not  routinely  scheduled.  They  are  more  fre¬ 
quently  occasioned  by  indications  from  the  field  of  training  problems, 
and  their  frequency  is  also  affected  by  travel-budget  considerations. 

More  objective  and  formal  training  effectiveness  analyses  are  performed 
when  a  new  training  course  is  introduced  or  when  weapons  system  mod¬ 
ifications  prompt  major  changes  in  the  curriculum.  These  analyses 
include  written  tests,  hands-on  tests,  and  interviews  with  soldiers  and 
their  supervisors.  The  most  recent  training  effectiveness  analysis  for  the 
courses  we  reviewed  was  conducted  during  the  summer  of  1987  and  was 
prompted  by  changes  to  the  Hawk  missile  system. 
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Navy 


Sources  of  Individual  Field 
Performance  Data 


We  considered  two  possible  sources  of  field  performance  information 
routinely  collected  by  the  Navy  as  measures  of  the  effectiveness  of  the 
training  courses  in  our  sample:  Level  II  surveys  and  Advancement  in 
Rating  Examinations.  The  Level  II  survey  program  was  designed  to  col¬ 
lect  information  on  the  job  performance  of  recent  training-school  gradu¬ 
ates.2  For  each  course,  questionnaires  were  sent  to  the  supervisors  of 
graduates  approximately  six  months  after  graduation,  asking  them  to 
rate  individual  tasks  performed  within  the  specialty  (as  to  their  impor¬ 
tance)  and  the  adequacy  of  the  level  of  training  demonstrated  by  the 
course  graduates.  We  found,  however,  that  Level  II  surveys  have  been 
effectively  abandoned  by  the  Navy,  and  that  none  has  been  performed 
since  at  least  1986. 


Advancement  in  Rating  Examinations  are  multiple-choice  tests  adminis¬ 
tered  to  candidates  for  promotion  who  have  already  been  certified  as 
qualified  by  their  commanding  officers.  Different  tests  are  prepared  for 
each  promotion  cycle,  and  their  results  are  used  to  rank  candidates. 
Because  they  are  not  standardized,  and  are  not  administered  to  all  grad¬ 
uates,  these  tests,  in  the  judgment  of  test  developers  and  administrators, 
are  “not  a  good  source  of  training  evaluation  feedback.”  We  concurred 
with  this  judgment. 


Internal  Review  of 
Evaluation  Practices 


In  1986,  the  Chief  of  Naval  Operations  requested  that  the  Naval 
Training  Systems  Center  (NTSC)  determine  the  current  status  of  Navy 
training  evaluation  and  provide  recommendations  for  the  future  conduct 
of  such  operations.  NTSC  submitted  three  reports  to  the  Chief  of  Naval 
Technical  Training  in  1988.  They  identified  three  central  evaluation 
functions:  Level  II  surveys,  the  Fleet  Training  Assessment  Program 
(fletap),  and  the  Training  Assessment  Survey  Team  (tast).  The  tast 
concept  had  only  recently  been  established  at  the  time  of  the  NTSC 
report,  and  only  two  surveys  had  been  completed  under  the  program. 
These  surveys  were  limited  to  new  weapons  systems  and  involved  fleet 
visits  to  identify  training  deficiencies  and  requirements  and  any  correc¬ 
tive  actions  that  needed  to  be  taken. 


2The  term  derives  from  a  classification  of  evaluation  intensiveness  established  in  1981  by  the  Naval 
Education  Training  Command,  level  I  refers  to  unsolicited  feedback  to  training  sites  concerning 
training  adequacy,  level  II  to  a  quest ioruiaire  sent  to  the  ficct,  and  Level  III  to  an  in-depth  analysis  of 
problems  identified  in  lower  level  reviews. 
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fletap  is  currently  a  reactive  system  that  attempts  to  identify  training 
deficiencies  through  either  direct  input  from  the  fleet  or  review  of 
reports  and  other  fleet  materials,  fletap  is  also  responsible  for  per¬ 
forming  Training  Quality  Reviews,  which  involve  administering  job  per¬ 
formance  tests  to  fleet  personnel  to  measure  adequacy  of  training.  No 
such  reviews  have  been  completed.  The  fletap  component  responsible 
for  the  Pacific  Fleet  consists  of  five  full-time  staff  positions,  four  of 
which  were  filled  at  the  time  of  our  visit  there.  Its  Atlantic  Fleet  coun¬ 
terpart  has  four  authorized  staff  positions,  three  of  which  were  filled. 

The  ntsc  report  also  identified  numerous  other  nonformal  or  noncentral- 
ized  evaluation  and  evaluation-related  activities  within  the  Navy’s 
training  community.  However,  ntsc  found  that  the  quality  of  current 
Navy  classroom  training  cannot  be  readily  ascertained  for  the  vast 
majority  of  courses;  that  there  is  a  general  lack  of  technical  evaluation/ 
assessment  skills;  that  current  evaluation  activities  are  fractionated,  not 
comprehensive,  and  operating  in  an  environment  of  obsolete  instruc¬ 
tions  and  unclear  objectives,  ntsc  concluded  that  the  fleet’s  mandate  to 
provide  useful  data  to  the  training  community  about  the  performance  of 
its  graduates  needed  to  be  enforced  and  that  fleet  evaluation  activities 
should  be  upgraded  and  appropriately  staffed.  It  also  recommended  that 
internal  training  appraisal  responsibility  be  decentralized  to  the  training 
site  level  and  that  independent  external  programs  be  reviewed  for  tech¬ 
nical  adequacy  and  integrated  into  an  overall  systematic  approach. 

In  response  to  these  reports,  a  three-person  team  has  recently  been 
established  at  the  headquarters  of  the  Chief  of  Naval  Education  and 
Training  to  review  the  ntsc  proposals  and  recommend  an  integrated 
training  appraisal  program.  No  firm  timetable  has  yet  been  established 
for  the  team’s  report,  but  they  anticipate  providing  a  proposal  in  the 
summer  of  1990.  We  welcome  this  Navy  effort,  but  we  question  whether 
this  response  will  prove  adequate  in  view  of  the  severity  and  extensive¬ 
ness  of  the  problems  ntsc  has  documented. 
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Air  Force 


Sources  of  Individual  Field 
Performance  Data 


We  considered  sources  of  individual-level  data  for  field  performance  of 
Air  Force  personnel  equivalent  to  those  we  considered  for  the  Navy — 
that  is,  promotion  examinations  and  supervisory  surveys.  After  inter¬ 
viewing  Air  Force  personnel,  however,  we  concluded  that  neither  was 
appropriate  for  our  purposes. 


Unlike  the  Navy’s  Level  II  surveys,  the  Air  Force  supervisory  surveys 
are  still  in  use.  They  are  conducted  by  the  training  sites’  evaluation 
units  for  each  training  course  at  2-  to  3-year  intervals.  Questionnaires 
are  sent  to  the  supervisors  of  recent  training  graduates  to  determine 
how  frequently  they  perform  each  of  the  major  tasks  for  which  they 
were  trained,  and  how  well  they  perform  them.  A  summary  training 
evaluation  report  is  produced  from  these  data  identifying  task-specific 
training  deficiencies  and/or  unnecessary  training.  We  were  informed 
that  the  individual-level  data  collected  by  these  surveys  are  not  main¬ 
tained  by  the  training  sites  after  their  reports  have  been  prepared. 
Therefore,  no  individual  data  exist  that  would  allow  us  to  perform  anal¬ 
yses  equivalent  to  those  we  performed  using  the  Army  sqt  data. 


Other  Evaluation-Related 
Activities 


Other  training  assessment  procedures  exist,  including  training  quality 
reports,  utilization  and  training  workshops,  and  occupational  survey 
reports.  Training  quality  reports  provide  a  means  for  supervisors  of 
recent  training-site  graduates  to  report  apparent  deficiencies  in  a 
recruit's  training.  Like  the  Navy’s  fletap  activities,  these  reports  are 
part  of  a  reactive  evaluation  process.  A  succession  of  training  quality 
reports  for  a  given  course  can  lead  to  a  complete  course  review.  The 
other  activities  are  more  concerned  with  front-end  analysis.  Occupa¬ 
tional  survey  reports  on  occupational  specialties  are  prepared  approxi¬ 
mately  every  three  to  four  years.  They  are  based  on  questionnaires 
designed  to  define  the  major  tasks  performed  by  specialists  and  their 
relative  frequency.  Utilization  and  training  workshops  are  held  when 
the  job  requirements  of  an  old  occupational  specialty  change  dramati¬ 
cally  or  when  a  new  specialty  is  defined.  Major  command  functional 
officers,  training  staff  officers,  and  managers  at  the  Air  Force  technical 
schools  participate  by  examining  data  from  occupational  survey  reports 
and  identifying  the  specific  training  requirements  of  the  specialty. 
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Alternative  Data 
Sources:  The  Job 
Performance 
Measurement  Project 


A  key  impediment  to  establishing  a  field  evaluation  component  of 
training  assessment  is  the  expense  of  developing,  testing,  and  adminis¬ 
tering  measures  that  validly  and  reliably  measure  actual  performance. 
Since  the  early  1980’s,  a  msyor  effort  to  address  these  measurement 
issues  has  been  under  way  under  the  direction  of  the  Office  of  Accession 
Policy  of  the  Office  of  the  Assistant  Secretary  of  Defense  for  Force  Man¬ 
agement  and  Personnel.  Known  as  the  Joint-Service  Job  Performance 
Measurement  (jpm)  project,  the  effort  was  initiated  at  the  request  of  the 
Congress  to  validate  asvab  measures  against  actual  performance  in  the 
field — instead  of  against  training  grades,  which  had  been  the  sole  crite¬ 
rion.  The  project  was  triggered  by  the  discovery  of  the  asvab  mis- 
norming  in  the  late  1970’s,  which  unintentionally  allowed  some  300,000 
less  qualified  recruits  into  the  services  and  resulted  in  field  com¬ 
manders’  complaints  of  quality  deterioration  among  their  personnel,  jpm, 
in  other  words,  was  directed  toward  testing  the  connection  between  the 
first  and  third  points  in  our  model:  test  data  collected  for  selection  and 
classification  purposes  at  recruitment,  and  field  performance  data,  jpm 
did  not  set  out  to  establish  a  link  between  classroom  performance  and 
field  performance. 


jpm  concluded  that  suitable  measures  of  field  performance  did  not  exist, 
and  undertook  to  develop  them.  Over  several  years,  some  highly  reliable 
hands-on  performance  tests  were  developed  and  administered  for  25 
occupational  specialties  across  the  four  services.  Surrogates  for  hands- 
on  testing  were  also  developed,  including  more  traditional  job-knowl¬ 
edge  tests  and  performance  ratings,  jpm  concluded  that  AtxjT  reliably 
predicted  differences  in  levels  of  actual  field  performance,  and  that 
these  differences  tended  to  persist  through  a  recruit’s  enlistment,  jpm, 
however,  has  not  reported  any  analyses  of  sex-  or  race-related  differ¬ 
ences.  Because  of  its  asvab  orientation,  the  project  also  has  not 
addressed  the  issue  of  the  classroom/field-performance  connection. 

jpm  performance  measures  were  expensive  to  develop  and  frequently 
costly  to  administer,  and  they  therefore  may  not  be  suitable  for  more 
routine  use  as  measures  of  training  effectiveness.  However,  the  invest¬ 
ment  made  to  develop  these  measures  and  their  surrogates  could  prove 
more  profitable  if  some  of  the  measures  developed  and  the  lessons 
learned  in  the  jpm  effort  were  more  widely  applied  to  the  development 
of  realistic  assessment  procedures  for  training. 
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Summary  and 
Conclusions 


Our  third  evaluation  question  asked  to  what  extent  the  services’  selec¬ 
tion  criteria  and  training  evaluation  measures  predict  success  in  high 
technology  roles.  While  we  identified  a  multitude  of  evaluation-related 
activities  in  the  three  services,  we  nevertheless  concluded  that  insuffi¬ 
cient  data  existed  for  us  to  respond  to  this  question.  Army  sqt  data  can 
be  adapted  for  this  purpose,  but  neither  the  Navy  nor  the  Air  Force  rou¬ 
tinely  collects  and  maintains  field  performance  data  to  evaluate  indi¬ 
vidual-level  training  effectiveness. 

Our  analysis  of  Army  sqt  data  was  hindered  by  the  limited  size  of  the 
sample.  We  were  able  to  derive  some  preliminary'  conclusions,  how¬ 
ever — namely,  that  classroom  performance,  as  measured  by  sqt,  is  a 
moderately  strong  indicator  of  future  field  performance  for  males,  but 
not  for  females,  and  that  asvab  can  predict  sqt’s  moderately  well  for 
white  male  recruits,  but  is  apparently  unrelated  to  SQT  scores  achieved 
by  women  and  minorities.  These  aSVab/SQT  findings  are  consistent  with 
the  pattern  of  ASVAB/course-grade  relationships  we  discussed  in  the  pre¬ 
vious  chapter. 

The  lack  of  other  objective,  systematically  collected  field  evaluation 
data  renders  meaningful  evaluation  of  training  effectiveness  impossible 
Decisionmakers — whether  they  are  in  the  Congress,  dob,  or  the  indi¬ 
vidual  services — can  only  react  to  problems  in  the  field  after  they  have 
become  apparent  and  have  been  identified  as  training-related.  However, 
given  the  cost  and  complexity  of  today’s  military  equipment,  it  is  imper¬ 
ative  that  the  services  possess  adequate  evaluative  data  to  monitor  how 
well  personnel  are  being  prepared  to  use  and  maintain  these  weapons. 
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Summary 


Our  report  has  addressed  three  evaluation  questions: 

How  has  the  aptitude  of  recruits  for  technologically  sophisticated  spe¬ 
cialties  changed  since  1980? 

How  useful  are  the  data  collected  by  the  services  before  and  during 
classroom  training  for  selecting  individuals  for  high  technology  roles 
and  for  evaluating  the  effectiveness  of  this  training? 

How  well  do  the  services’  selection  criteria  and  training  evaluation  mea¬ 
sures  predict  success  in  high  technology  roles? 

To  respond  to  these  questions,  we  examined  the  three  essential  types  of 
information  that  could  be  used  to  assess  the  effectiveness  of  military 
training:  (1)  data  collected  at  entry  to  the  military  for  selection  and 
assignment  to  an  occupational  specialty,  (2)  data  on  classroom  measures 
of  performance  during  formal  training,  and  (3)  data  on  individual  field 
performance.  Our  analysis  has  been  set  in  the  context  of  a  recruit  pool 
shifting  toward  a  much  higher  representation  of  women  and  minorities. 

To  answer  the  first  question,  we  examined  asvab  scores  during  the 
1980’s  and  found  that  (1)  most  gains  in  recruit  quality  occurred  in  the 
first  half  of  the  decade,  (2)  technical  abilities  of  recruits  have  begun  to 
decline,  and  (3)  women  and  minorities  continue  to  score  lower  on  tech¬ 
nical  measures  than  white  males.  These  findings  suggest  that  an 
increased  burden  will  be  placed  on  the  services’  training  establishments 
to  assure  the  technical  competence  of  their  future  graduates.  The  ser¬ 
vices’  response  may  also  need  to  include  more  demographically  sensitive 
training  and/or  additional  compensatory  training  to  raise  basic  skill 
levels. 

Our  response  to  the  second  question  involved  an  analysis  of  classroom 
grades  from  thirteen  technical  courses.  Our  findings  indicated  that(l) 
some  deficiencies  exist  in  the  Army’s  computerized  grading  system;  (2) 
during  training  women  and  minorities  overcome  their  initially  lower 
technical  scores  in  the  Navy  and  Air  Force,  but  not  in  the  Army;  (3) 
classroom  success  appears  more  related  to  a  general  ability  level  as  mea¬ 
sured  by  asvab  than  to  the  Electronics  Composite  score  currently  in  use, 
particularly  for  women;  and  (4)  asvab’s  ability  to  predict  classroom  suc¬ 
cess  for  minorities  is  weak. 

The  last  three  findings  are  interrelated.  Unlike  the  Army,  in  the  Navy 
and  Air  Force,  women  entered  training  with  significantly  higher  APQT 
scores  than  men.  In  addition,  the  gap  in  afqt  scores  between  whites  and 
nomvhites  was  twice  as  large  for  Army  trainees  as  for  their  Navy  and 


Page  53 


GAO/PEMD-91-1  Military  Technical-Training  Effectiveness  Is  Unknown 


Chapter  5 

Summary,  Recommendations,  and  Agency 
Comments  and  Our  Response 


Air  Force  counterparts.  Based  on  these  findings,  we  concluded  that  the 
services  should  consider  developing  a  more  general  asvab  derivative, 
such  as  our  factor  score,  to  assign  women  and  minorities  to  technical 
training. 

We  found  that  there  was  insufficient  evidence  to  attribute  the  weak 
relationship  between  asvab  and  course  grades  for  women  and  minorities 
either  to  problems  with  asvab  or  to  factors  in  the  training  environment. 
Yet,  whatever  its  source,  the  relative  inconsistency  of  the  two  measures 
exists  and  should  be  addressed  by  both  the  recruiting  and  training 
communities. 

In  response  to  the  third  question,  we  examined  post-classroom  measures 
of  training  effectiveness.  We  concluded  that  (1)  only  the  Army  routinely 
collects  data  on  individual  field  performance  useful  for  training  evalua¬ 
tion  purposes;  (2)  on  the  basis  of  these  Army  data,  asvab  scores  are  even 
weaker  predictors  of  field  performance  for  women  and  minorities  than 
of  classroom  success;  and  (3)  the  Navy’s  training  evaluation  component 
is  in  need  of  more  intense  review  and  reform  than  it  is  currently 
receiving. 

In  summary,  we  found  serious  weaknesses  or  gaps  at  each  of  the  data 
points  required  by  the  evaluation  model  posited  in  chapter  1.  Of  these, 
the  most  serious  deficiency  is  the  inability  of  the  Air  Force  and  Navy  to 
base  their  evaluation  of  their  selection  procedures  and  classroom 
training  in  systematically  collected,  objective  field  performance  data. 
Without  the  ability  to  test  the  “fit"  of  these  data  points  with  one 
another,  the  services  are  not  able  to  maximize  their  training  effective¬ 
ness,  or  even  to  estimate  realistically  how  successful  their  training 
investment  is  in  producing  skilled  operators  and  maintainors  of 
today’s — and  tomorrow’s — sophisticated  weaponry. 


Recommendations 


We  believe  that  evaluating  the  effectiveness  of  the  training  provided  by 
the  services  is  crucial  if  they  are  to  meet  the  future  challenges  of 
changing  recruit  demographics  and  increasingly  sophisticated  weap¬ 
onry.  Therefore,  we  make  the  following  recommendations  for  action  at 
each  of  the  three  information  collection  points  that  we  consider  essential 
to  adequate  training  evaluation:  (1)  that  the  Office  of  Force  Manage¬ 
ment  and  Personnel  direct  the  personnel  research  it  coordinates  among 
the  individual  services  to  identify  more  sensitive  predictors  of  classroom 
performance  for  women  and  minority  students  from  the  asvab  data  it 
already  possesses;  (2)  that  the  Secretary  of  the  Army  direct  the  Training 
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and  Doctrine  Command  to  review  the  classroom  grading  procedures 
identified  within  the  report  as  deficient,  for  their  accuracy,  appropriate¬ 
ness,  and  reliability;  (3)  that  the  Secretary  of  the  Navy  establish  a  firm 
deadline  for  developing  a  training  evaluation  program  and  that  he  direct 
that  the  adequacy  of  current  resources  allocated  to  this  effort  be  reex¬ 
amined.  Finally,  we  recommend  that  the  Assistant  Secretary  of  Defense 
for  Force  Management  and  Personnel  review  alternative  measures  of 
field  performance  already  developed  by  the  services  under  the  Job  Per¬ 
formance  Measurement  project  for  their  potential  applicability  to 
training  and  on-the-job  performance  evaluation. 

Our  purpose  in  this  study  has  been  to  review  the  ability  of  the  services 
to  monitor,  evaluate,  and  (where  necessary)  adjust  training  to  changes 
in  the  demographics  and  technical  ability  of  the  recruit  pool  and  to  the 
technical  sophistication  of  weapons  systems.  Whatever  changes  in  our 
military  posture  are  occasioned  by  shifts  in  the  nature  of  threats  to  our 
national  security,  we  believe  that  accurate  information  relating  to  the 
recruit  pool,  to  the  effectiveness  of  military  training,  and  to  on-the-job 
performance  will  continue  to  be  essential  to  the  mission  of  our  armed 
forces. 


Agency  Comments  and 
Our  Response 


In  its  written  response  to  a  draft  of  this  report,  dod  concurred  with  all  of 
its  recommendations  and  identified  specific  actions  to  be  taken  toward 
implementing  them,  dod  also  concurred  or  partially  concurred  with  what 
it  identified  as  the  main  findings  contained  in  the  report,  dod  also  raised 
some  technical  methodological  questions  and  offered  some  thoughtful 
interpretations  of  our  findings.  (See  appendix  V.)  We  have  reviewed 
these  comments  and,  where  appropriate,  have  made  changes  to  the  text. 


dod  generally  agreed  with  our  description  of  changes  in  recruits'  asvah 
scores  during  the  past  decade.  It  commented,  however,  that.  It  would  be 
inappropriate  to  define  a  recruit’s  technological  sophistication  merely  as 
his  or  her  Electronics  Composite  score.  We  agree  that  this  would  be  a 
very  limited  definition,  and  for  this  reason  our  report  encouraged  the 
development  of  better  predictors  of  success  in  more  technologically 
demanding  occupational  specialties,  dod’s  speculation  that  the  decline  in 
Electronics  Information  scores  is  attributable  to  a  decline  in  technical 
vocational  education  in  high  schools  is  persuasive.  It  could  as  well  have 
speculated  that  the  lower  Electronics  Composite  scores  of  women 
recruits  arc  attributable  to  their  traditionally  lower  enrollment  in  such 
courses. 
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dod  generally  concurred  with  our  analysis  of  classroom  grades  and  their 
relationship  to  asvab  predictors.  However,  it  questioned  the  appropriate¬ 
ness  of  some  of  our  procedures,  dod  summarized  its  methodological  con¬ 
cerns  as  (1)  inappropriate  pooling  of  grades  from  courses  with  different 
metrics,  (2)  implausibly  high  factor  scores  after  correction  for  restric¬ 
tion  in  range,  (3)  lack  of  detailed  regression  analyses  for  differences 
between  subgroups,  and  (4)  small  sample  sizes  for  subgroups. 

dod  incorrectly  assumes  that  we  simply  pooled  raw  course  grades  from 
different  courses.  Before  performing  correlation  analyses,  we  standard¬ 
ized  course  grades  to  a  common  metric  to  adjust  for  any  differences 
between  courses  in  grading  procedures.  We  have  also  added  to  the  draft 
we  provided  dod  parallel  tables  of  results  on  the  individual-course  level. 
(See  appendixes  II  and  III.) 

We  share  dod’s  concern  about  the  apparently  inflated  values  of  the 
adjusted  validity  coefficients  for  factor  scores,  but  we  disagree  with 
their  speculation  that  inappropriate  statistical  procedures  are  the  source 
of  this  inflation.  We  applied  the  same  conventional  adjustment  proce¬ 
dures  to  all  three  scores — afqt,  Electronics  Composite,  and  factor 
scores — and,  as  DOD  comments,  for  the  first  two  scores  our  results  “are 
consistent  with  other  analyses.”  As  we  stated  in  the  draft  report,  the 
factor  scores  were  based  on  the  asvab  norm  group  correlation  matrix 
provided  us  by  dod.  Having  performed  a  principal-components  analysis 
of  these  data,  we  applied  the  resultant  scoring  coefficients  to  our  sample 
to  obtain  factor  scores.  This  procedure  ideally  offers  two  advantages. 
First,  it  bases  the  correlation  analysis  on  a  norm  group  presumably 
closer  to  the  universe  of  applicants  to  military  service  than  our  sample 
of  relatively  high-scoring  recruits.  Second,  it  permits  adjustment  for 
restriction  of  range. 

After  thorough  reexamination  of  our  procedures  and  the  data  to  which 
they  were  applied,  we  concluded  that  the  results  of  factor  analysis  of 
the  dod  correlation  matrix  should  not  be  applied  to  our  sample  because 
of  differences  between  the  two  samples  in  the  magnitude  of  subtest 
intercorrelations.  DOD  reported  substantially  higher  intercorrelations 
than  were  present  in  our  sample.  As  a  result,  the  variance  of  our 
sample’s  factor  scores,  when  based  on  the  dod  correlations,  was  inappro¬ 
priately  restricted,  and  the  adjustment  for  range  restriction  was  overes¬ 
timated.  (All  other  things  being  equal,  the  smaller  the  sample  variance, 
the  greater  the  adjustment  for  restriction  in  range.) 
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We  therefore  have  recalculated  our  factor  scores,  deriving  them  from  a 
principal-component  analysis  of  our  sample’s  asvab  scores  rather  than 
from  an  analysis  of  the  norm-group  correlation  matrix  provided  by  dod. 
Consequently,  no  adjustment  for  restriction  of  range  would  be  appro¬ 
priate  for  these  scores.  While  the  correlations  of  these  factor  scores  with 
our  criterion  measures  vary  somewhat  from  those  originally  reported 
(being  in  some  cases  higher  and  in  others  lower),  the  slight  differences  in 
no  way  affect  the  conclusion  that  we  reached  in  the  draft  report  and 
with  which  dod  has  agreed  in  both  written  and  oral  comments — namely, 
that  a  broader-based  measure  than  the  simple  composites  currently  in 
use  would  provide  a  valuable  predictor  of  classroom  performance. 

dod  cites  the  absence  of  certain  regression-related  statistics — intercepts, 
regression  coefficients,  and  standard  errors  of  estimates — and  the  small 
sample  size  in  some  subgroups  as  reasons  for  not  “generalizing  to  other 
samples”  or  “making  policy  decisions”  on  the  basis  of  our  report.  First, 
for  simple  bivariate  relationships  such  as  we  analyzed  (asvab  versus 
course  grades  or  sqt),  our  detailed  reporting  of  means,  N’s,  correlation 
coefficients,  and  significance  levels  serves  essentially  the  same  function 
as  these  equivalent  regression  statistics.  We  would,  however,  gladly  pro¬ 
vide  our  data  base  to  DOD  for  alternative  analysis.  Second,  we  repeatedly 
draw  the  reader’s  attention  to  the  problem  of  small  sample  size  in  some 
subgroups.  Most  importantly,  we  strongly  agree  that,  unless  they  are 
replicated  on  larger  samples,  our  analyses  should  not  be  the  basis  for 
significant  policy  shifts  in  selection  and  classification  of  recruits. 

Rather,  we  recommended  (and  dod  concurred)  that  the  services  attempt 
to  develop  more  sensitive  predictors  of  training  success  for  minorities 
and  women.  (Indeed,  one  of  the  main  strengths  of  our  work  here  is  that 
it  determined  the  insensitivity  to  these  populations  of  current 
predictors.)  Should  the  results  of  these  efforts  prove  successful,  policy 
changes  would  then  be  appropriate. 

The  Army  found  “neither  surprising  nor  particularly  disturbing"  the 
fact  that  we  were  not  able  to  use  many  of  the  test  scores  they  provided 
for  some  courses  because  they  do  not  discriminate  among  soldiers’  per¬ 
formances.  We  would  point  out  that  (1)  the  same  software  and  report 
formats  are  used  to  assign  scores  to  trainees  in  these  courses  as  in  other 
similar  courses  where  we  found  usable  scores;  (2)  we  were  able  for  some 
of  these  cases  to  reanalyze  the  individual  measures  and  derive  mean¬ 
ingful  scores;  and  (3)  the  Army  assigns  and  maintains  rank-in-class  sta¬ 
tistics  for  each  graduate  of  these  courses  on  the  basis  of  this  software, 
thus  itself  implicitly  measuring  and  recording  the  relative  performance 
of  individuals.  While  our  ability  to  perform  correlational  analyses  may 
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not  be  a  critical  need,  in  our  opinion  the  Army’s  ability  to  perform  objec¬ 
tive  evaluations  of  the  effectiveness  of  its  courses  is.  We  therefore  wel¬ 
come  the  concurrence  of  the  Army  in  our  recommendation  to  review  its 
testing  procedures  for  the  courses  we  identified. 

dod  commented  on  our  review  of  field  measures  of  training  effectiveness 
for  each  of  the  services,  asserting  that  our  negative  view  of  asvab  scores 
as  a  predictor  of  performance  for  female  and  minority  soldiers  was  con¬ 
trary  to  research  on  predicting  training  success.  Not  only  does  dod  pro¬ 
vide  no  specifics  on  this  research  but  also,  and  more  importantly,  it  is 
not  clear  how  predicting  training  outcomes  is  directly  relevant  to  the 
issue  of  field  performance.  Of  more  interest  are  the  preliminary  results 
reported  from  ongoing  research  by  the  Army  Research  Institute.  These 
results  suggest  a  fairly  strong  relationship  for  women  and  a  somewhat 
weaker,  but  still  significant,  relationship  for  blacks  between  asvab  and 
sqt  in  larger  occupational  specialties.  The  Army  appears  to  concede  that 
these  results  may  not  be  true  for  smaller,  more  technical  specialties, 
such  as  the  ones  we  examined.  What  is  most  noteworthy  about  the 
Army’s  response,  however,  is  its  capability  to  perform  these  analyses  of 
field  performance  routinely,  a  capability  that  the  Navy  and  Air  Force  do 
not  share. 

The  Navy  supplied  some  information  on  recent  steps  being  taken  to 
enhance  training  evaluation  methods  in  addition  to  the  ones  we  identi¬ 
fied  in  the  report.  The  Air  Force  commented  that  they  do  not  have  sqt’s 
and  do  not  plan  to  introduce  them  in  the  near  future.  It  noted  that 
“testing,  recoding,  and  documenting  individual  performance  for  statis¬ 
tics  is  very  time-consuming,  requires  additional  manpower,  and  is  cost- 
prohibitive.”  It  would  be  difficult  to  agree  with  the  Air  Force  that  deter¬ 
mining  the  effectiveness  of  individual  performance  is  merely  a  statis¬ 
tical  endeavor,  or  even  that  it  is  an  optional  one.  Rather,  it  lies  at  the 
core  of  our  ability  to  know  how  well  we  are  prepared  for  meeting  critical 
defense  challenges.  Indeed,  given  the  cost  and  complexity  of  today’s  mil¬ 
itary  equipment,  it  is  imperative  that  all  the  services  possess  adequate 
evaluative  data  to  monitor  how  well  personnel  are  being  trained  to  use 
and  maintain  these  weapons.  Our  report  does  not  propose  the  introduc¬ 
tion  of  sqt’s  into  other  services,  nor  does  it  attempt  to  determine  the 
cost-effectiveness  of  sqt’s.  It  does,  however,  assert  the  need  for  objec¬ 
tive,  systematically  collected  information  on  individual  field  perform¬ 
ance  in  all  services. 
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Finally,  dod  noted  that  it  had  directly  addressed  the  applicability  of  les¬ 
sons  learned  from  the  Joint-Service  Job  Performance  Measurement  Pro¬ 
gram  in  1985,  but  had  deferred  implementing  any  training-related 
application  of  these  measures  at  that  time,  dod  states  that  it  will  explore 
the  feasibility  of  such  an  application  once  again. 


Page  59 


GAO/PEMD-91-1  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  I _ 

AFQT  Mean  Score  and  Electronics  Composite 
Summary  Statistics:  1981-89 


Table  1.1:  AFQT  Mean  Scores,  by 
Gender* 

Year 

Male 

Number 

Mean 

Female 

Number 

Mean 

1981 

163,571 

203.95 

22,886 

202.95 

1982 

222,726 

206  26 

30,311 

209.10 

1983 

227,161 

209  51 

32.546 

21157 

1984 

226,975 

21036 

32,026 

211.15 

1985 

222,772 

211.55 

35,368 

211.43 

1986 

254,030 

211.94 

37,175 

212.73 

1987 

239,122 

212.17 

35,385 

212.42 

1988 

213,493 

212.64 

32,682 

21204 

1989 

217,783 

21183 

35,984 

211.78 

“Sum  of  subtest  standard  scores 

Table  1.2:  AFQT  Mean  Scores,  by  Service* 


Armv 

Navy 

Air  Force 

Marine  Corps 

Year 

Number 

Mean 

Number 

Mean 

Number 

Mean 

Number 

Mean 

1981 

76,284 

195.52 

47.715 

208.61 

37,339 

213.12 

25,069 

1982 

108,063 

201.73 

55,182 

210.06 

57,442 

212.86 

32,350 

205.84 

1983 

121,112 

206.07 

55,256 

212.52 

51,771 

216.72 

31,568 

207.78 

1984 

118,287 

207.07 

57,214 

211,85 

50,235 

218.45 

33,265 

1985 

111,625 

209.30 

59,604 

211.92 

57,617 

217.08 

29,294 

20834 

1986 

125,918 

210.33 

68,891 

210.30 

62,372 

217.08 

211.44 

1987 

120,538 

210.73 

66,078 

210.75 

54,371 

218.10 

33,520 

210.90 

1988 

102,709 

210.88 

69,080 

211.58 

34,299 

210.93 

1989 

106,126 

209  42 

73,272 

21040 

220.59 

32,122 

211.45 

“Sum  ol  subtest  standard  scores 
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Table  1.3:  AFQT  Mean  Scores,  by  Race/Ethnicity* _ 

_ White _  Black _ 

Year  Number  Mean  Number  Mean 

1981  138.431  209.27  35,666  186.56 

1982  189,134  211.48  48,377  190.86 

1983  196,585  214.19  47,540  194  54 

193,193  215.07  48,500  194,99 

190,243  215.79  49,663  197.97 

1986  212,661  215.94  56,150  199.20 

1987  198,130  216.62  54,166  198.67 

1988  174,501  217.16  50,370  199.14 

1989  _ 177,111  216.40  53,409  199.07 

aSum  of  subtest  standard  scores 


Hispanic 

Number  Mean 


6,904  191,00 

8,569  193.97 

8,616  198.71 

9,439  199,46 

9,504  202.32 

12,059  204.26 

13,708  205.00 

13,567  205  92 

15,499  205.92 


_ Other _ 

Number  Number 

5,456  194.95 


Table  1.4:  AFQT  Mean  Score  Overall 

Totals*  _ Overall  total 


Year 

Number 

Meanb 

1981 

186,457 

203.83 

1982 

253,037 

206.60 

1983 

259,707 

209.77 

1984 

259,001 

210.41 

1985 

258,140 

211.53 

1986 

291,205 

211.90 

1987 

274,507 

212,21 

1988 

246,175 

212.56 

1989 

253,767 

211.82 

*Sum  of  sublest  standard  scores 

bStandard  deviation  =  20  66 
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Table  1.5:  Electronics  Composite  Mean 

Scores,  by  Gender*  _ Male _  _ Female 


Year 

Number 

Mean 

Number 

Mean 

1981 

163,571 

207.89 

22,886 

194.41 

1982 

222,726 

21000 

30,311 

199.18 

1983 

227,161 

212,91 

32,546 

201.52 

1984 

226,975 

213  46 

32,026 

201.40 

1985 

222,772 

212.70 

35,368 

199.57 

1986 

254,030 

211.76 

37,175 

200.57 

1987 

239,122 

212.17 

35.385 

200.57 

1988 

213,493 

212.73 

32,682 

199.43 

1989 

217,783 

211.50 

35,984 

199.97 

aSum  of  sublest  standard  scores 


Table  1.6:  Electronics  Composite  Mean  Scores,  by  Service* 


Army _  _ Navy _  Air  Force  Marine  Corps 


Year 

Number 

Mean 

Number 

Mean 

Number 

Mean 

Number 

Mean 

1981 

76,284 

198.22 

47,715 

209.76 

37,389 

215,75 

25,069 

208.27 

1982 

108,063 

204.03 

55,182 

210.33 

57.442 

215  24 

32.350 

207.90 

1983 

121,112 

207.92 

55,256 

212.16 

51,771 

218.34 

31,568 

21000 

1984 

118,287 

208.56 

57,214 

211.69 

50,235 

219.87 

33,265 

1985 

111,625 

208.66 

59,604 

209,66 

57,617 

216.77 

29,294 

20817 

1986 

125,918 

208.73 

68,891 

207.32 

62,372 

215.48 

34,024 

1987 

120,538 

203.79 

66,078 

208.55 

54,371 

217.21 

33,520 

209.36 

1988 

102,709 

209.11 

69,080 

208.71 

40,087 

219.01 

34,299 

209.53 

1989 

106,126 

207.19 

73,272 

207.29 

42,247 

218.69 

32,122 

209.65 

*Sum  of  subtcsl  standard  scores 
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Table  1.7:  Electronics  Composite  Mean  Scores,  by  Race/Ethnicity* 

Year 

White 

Black 

Hisoanic 

Other 

Number 

Mean 

Number 

Mean 

Number 

Mean 

Number 

Mean 

1981 

138,431 

212.47 

35,666 

186.45 

6,904 

193.40 

5.456 

197.91 

1982 

189,134 

214.51 

48,377 

190.01 

8,569 

196.37 

6,957 

201.33 

1983 

196,585 

216.81 

47,540 

193  24 

8,616 

200  93 

6,966 

204  31 

1984 

193,193 

217.53 

48,500 

193.49 

9,439 

201.35 

7,869 

1985 

190,243 

216.28 

49,663 

193.94 

9,504 

202.50 

8,730 

205  87 

1986 

212,661 

215.50 

56,150 

194.11 

12,059 

203.07 

10,335 

205.78 

1987 

198,130 

216.19 

54,166 

193  50 

13,708 

203.76 

8,503 

207.23 

1988 

174,501 

216.86 

50,370 

194.08 

13,567 

204.54 

7,737 

HEBEs 

1989 

177.111 

215.64 

53,409 

193.46 

15,499 

203.66 

7.748 

•Sum  ol  subtest  standard  scores 

Tahlf»  1  A-  Flat 

'trnnirQ  f!nmnn«ifp  K 

lean  HB 

Score  Overall  Totals* 

Overall  total 

Year 

Number 

Meanb 

1981 

186,457 

206.04 

1982 

253,037 

208.44 

1983 

259,707 

211  15 

1984 

259,001 

211.59 

1985 

258,140 

21065 

1986 

291,205 

209.97 

1987 

274,507 

210.47 

1988 

246,175 

210.67 

1989 

253,767 

209  45 

•Sum  ol  subtest  standard  scores 

'’Standard  deviation  »  22.19 

Page  G3 


GAO/PEMD-91-4  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  II 


Predictor  and  Criterion  Variable  Mean  Scores 


Table  11.1:  Army  Mean  Scores 

Category 

AFQT* 

Electronics 

Composii 

Course  arade 

SQTb 

Mean 

Number 

Mean 

Nu'  ber 

Mean 

Number 

Mean 

Number 

24J 

227.87 

65 

234.75 

65 

86.75 

76 

8258 

53 

27N 

226.73 

100 

232.85 

100 

88.78 

138 

83  95 

110 

29V 

238.22 

136 

242.92 

136 

93.55 

41 

76.98 

65 

Male 

232.14 

280 

238.46 

280 

89.23 

232 

82.12 

209 

Female 

232.87 

23 

230.13 

23 

80.31 

23 

77.52 

21 

White 

234.00 

255 

240.00 

255 

90.19 

160 

81.86 

144 

Nonwhite 

222.67 

48 

226  29 

48 

86.86 

95 

81.45 

86 

All  Army 

232.20 

303 

237.83 

303 

88.94 

255 

81.70 

'Sum  of  subtest  standard  scores 

bScore  on  Skills  Qualification  Test 

Table  11.2:  Navy  Mean  Scores 

Category 

AFQT* 

Mean  Number 

Electronics 

Composite* 

Mean  Number 

Course  arade 
Mean  Number 

AQ 

228.10 

783 

233.13 

783 

89.72 

833 

AX 

231.64 

392 

236.16 

392 

90.64 

469 

STG 

228,57 

3,233 

234.43 

3.233 

90  23 

3,418 

STS 

231.87 

1.698 

237.47 

1,698 

86.89 

1,723 

Male 

229,59 

6,080 

235.33 

6,080 

89.11 

5.882 

Female 

235.59 

76 

230.65 

76 

90.70 

71 

White 

230.49 

5,355 

236.25 

5,355 

89.20 

5.179 

Nonwhite 

224  18 

801 

228.74 

801 

89.57 

1,159 

All  Navy 

229.67 

6,156 

235.27 

6,156 

89.30 

6,443 

'Sum  ol  sublest  standard  scores 
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Predictor  and  Criterion  Variable  Mean  Scores 


Table  11.3:  Air  Force  Mean  Scores 


AFQT* 

Electronics 

Composite* 

Course  qrade 

Category 

Mean 

Number 

Mean 

Number 

Mean 

Number 

45530A 

235.53 

119 

240.72 

119 

90.17 

119 

45530B 

235.93 

231 

240.55 

231 

90.82 

231 

30332 

238.12 

212 

24500 

212 

91.77 

227 

30333 

234.15 

360 

239.77 

360 

91.31 

377 

Male 

235.45 

824 

241.94 

824 

91.31 

854 

Female 

237.73 

98 

235.88 

98 

89.91 

100 

White 

236  22 

825 

241.95 

825 

91.21 

855 

Nonwhite 

231.19 

97 

235.73 

97 

90.76 

90 

All  Air  Force 

235.68 

922 

241.29 

922 

91.16 

954 

“Sum  oi  sublesi  standard  scores 
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Appendix  III  _ 

Intercoirelation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.1:  Study 

Variables:  Army,  24J*  Electronics  Grade8 


Category 

AFQTb 

Composite' 

Factor' 

Raw 

Adjusted' 

Tola! 

AFQT 

1.00 

0.793 

0.833 

0319 

0.499 

Electronics  Composite 

65 

1.00 

0.819 

0.329 

0.339 

Factor 

65 

65 

1.00 

0.409 

Grade 

59 

59 

59 

1.00 

Male 

AFQT 

1.00 

0.823 

0.859 

0.293 

0479 

Electronics  Composite 

55 

1.00 

0.799 

0.289 

0309 

Factor 

55 

55 

1.00 

0389 

Grade 

50 

50 

50 

1.00 

Female 

AFQT 

1.00 

0.819 

0.899 

0.43 

0.63 

Electronics  Composite 

10 

1.00 

0  889 

0.15 

0.15 

Factor 

10 

10 

1.00 

0.21 

Grade 

9 

9 

9 

1.00 

White 

AFQT 

1.00 

0.829 

0.809 

024 

039 

Electronics  Composite 

49 

1.00 

0.799 

0.27 

0.29 

Factor 

49 

49 

1.00 

042= 

Grade 

44 

44 

44 

1.00 

Nonwhite 

AFQT 

1.00 

0.613 

0  809 

0.13 

0.23 

Electronics  Composite 

16 

1.00 

0.849 

6.15 

016 

Factor 

16 

16 

1.00 

0.17 

Grade 

15 

15 

15 

1.00 

“Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal 


”AFQT  ■»  sum  ol  subtest  standard  scores 

'Electronics  Composite  ■  sum  ol  subtest  standard  scores  lor  Electronics  Composite 

'Factor  =  score  (rom  lust  factor  from  principal  component  analysis 

'Grade  =  final  course  grade 

'Adjusted  "  correlation  adjusted  for  restriction  ol  range 

Op  <  .05 
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Table  111.2:  Intercorrelation  of  Study 
Variables:  Army,  27N* 


Appendix  in 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Electronics 

Grade' 

Category 

AFQTS 

Composite' 

Factor' 

Raw 

Adjusted' 

Total 

AFQT 

1.00 

0845 

0.85« 

0365 

0  559 

Electronics  Composite 

100 

1.00 

0925 

0.539 

0579 

Factor 

100 

100 

1.00 

0485 

Grade 

95 

95 

95 

1.00 

Male 

AFQT 

1.00 

0  863 

0.859 

0.399 

0  599 

Electronics  Composite 

94 

1.00 

0  939 

0525 

0569 

Factor 

94 

94 

1.00 

0489 

Grade 

89 

89 

89 

1.00 

Female 

AFQT 

1.00 

0863 

0829 

0849 

0.949 

Electronics  Composite 

6 

100 

0.969 

0.889 

0.939 

Factor 

6 

6 

1.00 

0.909 

Grade 

6 

6 

6 

1.00 

White 

AFQT 

1.00 

0829 

0829 

0319 

0.493 

Electronics  Composite 

85 

1.00 

0  905 

0499 

0529 

Factor 

85 

85 

100 

0  439 

Grade 

81 

81 

81 

100 

Nonwhite 

AFQT 

1.00 

0.803 

0819 

0.31 

049 

Electronics  Composite 

15 

1.00 

0  939 

0.659 

0.699 

Factor 

15 

15 

1.00 

0.629 

Grade 

14 

14 

14 

1.00 

’Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal 


bAFQT  =  sum  of  sublest  standard  scores 

'Electronics  Composite  =  sum  of  subtest  standard  scores  for  Electronics  Composite 

'Factor  =  score  from  first  factor  from  principal  component  analysis 

'Grade  =  final  course  grade 

'Adjusted  =  correlation  adjusted  for  restriction  of  range 

9p  <  .05 
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Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.3:  Intercorrelation  of  Study 
Variables:  Army,  29V* 


Category 

Electronics 
AFQTb  Composite' 

Factor* 

Grade* 

Raw  Adjusted' 

Tolal 

AFQT 

1.00 

0.749 

0.799 

0  20 

0.33 

Electronics  Composite 

136 

1.00 

0.889 

0.509 

0539 

Factor 

136 

136 

1.00 

0.389 

Grade 

35 

35 

35 

1.00 

Male 

AFQT 

1.00 

0.759 

0.809 

0.25 

0.41 

Eleclronics  Composite 

129 

1.00 

0.889 

0.479 

0509 

Factor 

129 

129 

1.00 

0369 

Grade 

32 

32 

32 

1.00 

Female 

AFQT 

1.00 

0.839 

0.809 

059 

0.78 

Electronics  Composite 

7 

100 

0.909 

0.79 

0.84 

Factor 

7 

7 

1.00 

057 

Grade 

3 

3 

3 

100 

While 

AFQT 

1.00 

0.74» 

0.789 

0.20 

033 

Electronics  Composite 

119 

1.00 

0879 

0539 

0.569 

Factor 

119 

119 

1.00 

0.409 

Grade 

29 

29 

29 

1.00 

Nonwhite 

AFQT 

1.00 

0.769 

0859 

018 

031 

Electronics  Composite 

17 

1.00 

0.869 

0.34 

036 

Factor 

17 

17 

1.00 

0.23 

Grade 

6 

6 

6 

1.00 

•Correlation  coefficients  aro  in  upper  diagonal  and  number  in  lower  diagonal 


"AFOT  ■  sum  nl  sublesl  standard  scores 

'Electronics  Composite  »  sum  ol  subtest  standard  scores  (or  Electronics  Composite 

aFac!ot «  score  from  first  factor  Irom  principal  component  analysis 

'Grade  «  final  course  grade 

’Adjusted  ■  correlation  adjusted  lot  res'nctnn  ol  range 

«p<  05 


Page  08 


GAO/l’KMD-91-1  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  IT! 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.4:  Intercorrelation  of  Study 
Variables:  Navy,  AQ* 


Electronics 

Grade* 

Category 

AFQT* 

Composite' 

Factor' 

Raw 

Adjusted' 

Total 

AFQT 

1.00 

0.839 

0.859 

0.259 

0.409 

Electronics  Composite 

783 

1.00 

0.869 

0.279 

0  299 

Factor 

783 

783 

1.00 

0.259 

Grade 

774 

774 

774 

1.00 

Male" 

AFOT 

1.00 

0.839 

0.859 

0.259 

0.409 

Electronics  Composite 

783 

1.00 

0.869 

0.279 

0.299 

Factor 

783 

783 

1.00 

0.259 

Grade 

774 

774 

774 

1.00 

White 

AFQT 

1.00 

0.839 

0.849 

0.259 

0.419 

Electronics  Composite 

665 

1.00 

0.869 

0.289 

0309 

Factor 

685 

665 

1.00 

0.279 

Grade 

656 

656 

656 

1.00 

Nonwhite 

AFQT 

1.00 

0.829 

0.869 

0.13 

022 

Electronics  Composite 

118 

1.00 

0.839 

0.16 

0.17 

Factor 

118 

118 

1.00 

0.07 

Grade 

118 

118 

118 

1.00 

’Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


'AFQT  »  sum  of  subtest  standard  scores 

'Electronics  Composite  »  sum  of  subtest  standard  scores  for  Electronics  Composite 

'Factor  =  score  from  first  factor  from  principal  component  analysis 

'Grade  ■  final  course  grade 

'Adjusted  »  correlation  adjusted  for  restriction  of  range 

5p  <  .05 

"Women  aro  prohibited  from  serving  in  tho  Navy's  AQ  occupational  specialty. 
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Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.5:  Intercorrelation  of  Study 
Variables:  Navy,  AXa 


Category 

Electronics 
AFQTb  Composite' 

Factor'1 

Grade' 

Raw  Adjusted1 

Total 

AFQT 

1.00 

0819 

0  839 

0.419 

0619 

Electronics  Composite 

392 

1.00 

0.899 

0.409 

0439 

Factor 

392 

392 

1.00 

0.399 

Grade 

391 

391 

391 

1.00 

Male 

AFQT 

1.00 

0.879 

0.889 

0.429 

0.629 

Electronics  Composite 

321 

1.00 

0.909 

0.439 

0  469 

Factor 

321 

321 

1.00 

0.419 

Grade 

320 

320 

320 

1.00 

Female 

AFQT 

1.00 

0.759 

0.809 

0.399 

0589 

Electronics  Composite 

71 

1.00 

0  839 

0.329 

0.349 

Factor 

71 

71 

1.00 

0  399 

Grade 

71 

71 

71 

100 

White 

AFQT 

1.00 

0.809 

0.839 

0.449 

0.659 

Electronics  Composite 

336 

1.00 

0.899 

0.469 

0.499 

Factor 

336 

336 

1.00 

0.449 

Grade 

335 

335 

335 

1.00 

Nonwhite 

AFQT 

1.00 

0.789 

0.849 

018 

0.29 

Electronics  Composite 

56 

1.00 

0.879 

0.02 

0.02 

Factor 

56 

56 

1.00 

0.07 

Grade 

56 

56 

56 

1.00 

“Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


°AFQT  =  sum  of  sublcsf  standard  scores 

'Electronics  Composite  »  sum  of  subtest  standard  scores  for  Electronics  Composite 

dFactor  ■>  score  from  lust  (actor  from  principal  component  analyst 

'Grade  « (mat  course  grade 

'Adjusted  =  correlation  adjusted  lor  restriction  of  range 

«p<.05 
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Intercorreiation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.6:  Intercorreiation  of  Study 
Variables:  Navy,  STG* 


Electronics 

Grade' 

Category 

AFQTb 

Composite' 

Factor' 

Raw 

Adjusted' 

Total 

AFQT 

1.00 

0.789 

0809 

0309 

0489 

Electronics  Composite 

3233 

1.00 

0849 

0.269 

0.28  9 

Factor 

3233 

3233 

1.00 

0.289 

Grade 

3123 

3123 

3123 

1.00 

Male” 

AFQT 

1.00 

0.789 

0.809 

0.309 

0489 

Electronics  Composite 

3233 

1.00 

0.849 

0.269 

0  289 

Factor 

3233 

3233 

1.00 

0.289 

Grade 

3123 

3123 

3123 

1.00 

White 

AFQT 

1.00 

0.799 

0809 

0319 

0.499 

Electronics  Composite 

2791 

1.00 

0.849 

6.289 

6.299 

Factor 

2791 

2791 

1.00 

0309 

Grade 

2697 

2697 

2697 

100 

Nonwhite 

AFQT 

1.00 

0.719 

0.769 

0229 

0379 

Electronics  Composite 

442 

1.00 

0.789 

0,169 

0169 

Factor 

442 

442 

1.00 

0.129 

Grade 

426 

426 

426 

1.00 

“Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


bAFQT  ■>  sum  of  subtest  standard  scores 

'Electronics  Composite  ■>  sum  ol  subtest  standard  scores  (or  Electronics  Composite 

'Factor  =  score  (tom  lirst  factor  from  principal  component  analysis 

'Grade  ■  final  course  grade 

'Adjusted  »  correlation  adjusted  for  restriction  of  range 

°p  <  .05 

•Women  are  prohibited  from  serving  in  the  Navy's  STG  occupational  specially. 
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Appendix  ID 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  111.7:  Intercorrelation  of  Study 
Variables:  Navy,  STS* 


Electronics 

Grade' 

Category 

AFQT" 

Composite' 

Factor*1 

Raw 

Adjusted1 

Total 

AFQT 

1.00 

0.769 

0.789 

0.289 

0.459 

Electronics  Composite 

1698 

1.00 

0.859 

0.269 

0279 

Factor 

1698 

1698 

100 

0  269 

Grade 

1651 

1651 

1651 

1.00 

Wale" 

AFQT 

1.00 

0.769 

0.789 

0.289 

0.459 

Electronics  Composite 

1698 

1.00 

0.859 

0.269 

0.279 

Factor 

1698 

1698 

1.00 

0.269 

Grade 

1651 

1651 

1651 

1.00 

White 

AFQT 

1.00 

0.779 

0.799 

0.289 

0.469 

Electronics  Composite 

1518 

1.00 

0859 

0.279 

0  299 

Factor 

1518 

1518 

1.00 

0.289 

Grade 

1477 

1477 

1477 

1.00 

Nonwhite 

AFQT 

1.00 

0.709 

0.689 

0.279 

0.449 

Electronics  Composite 

180 

1.00 

0.829 

0.11 

0.12 

Factor 

180 

180 

1.00 

0.12 

Grade 

174 

174 

174 

1.00 

“Correlation  coefficients  are  in  upper  diagonal  anti  number  in  lower  diagonal. 


"AFQT  »  sum  ol  subtest  standard  scores 

'Electronics  Composite  »  sum  ol  subtest  standard  scores  lor  Electronics  Composite 

'Factor  =  score  from  lirst  (actor  from  principal  component  analysis 

'Grade  ■  final  course  grade 

'Adjusted  =  correlation  adjusted  for  restriction  ol  range 

»p<,05 

"Women  are  prohibited  from  serving  In  the  Navy's  STS  occupational  specialty. 
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Appendix  III 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  of  Study 

Variables:  Air  Force,  45530A'  Electronics  Grade8 


Category 

AFQTb 

Composite' 

Factor' 

Raw 

Adjusted1 

Total 

AFQT 

too 

0.740 

0.199 

0.229 

0363 

Electronics  Composite 

119 

1.00 

0.87 

0.270 

0  299 

Factor 

119 

119 

1.00 

0  309 

Grade 

119 

119 

119 

1.00 

Male 

AFQT 

1.00 

0.770 

0.770 

0219 

0359 

Electronics  Composite 

99 

1.00 

0869 

0.269 

0.289 

Factor 

99 

99 

1,00 

0.279 

Grade 

99 

99 

99 

1.00 

Female 

AFQT 

1.00 

0.693 

0639 

031 

049 

Electronics  Composite 

20 

1.00 

0849 

0.15 

0.15 

Factor 

20 

20 

1.00 

0.25 

Grade 

20 

20 

20 

1.00 

White 

AFQT 

1.00 

0.759 

0.739 

0.249 

0393 

Electronics  Composite 

102 

1.00 

0  879 

0.289 

0299 

Factor 

102 

102 

1.00 

0.289 

Grade 

102 

102 

2102 

1.00 

Nonwhite 

AFQT 

1.00 

0.589 

0.659 

0.08 

013 

Electronics  Composite 

17 

1.00 

0.850 

0.22 

0.23 

Factor 

17 

17 

1.00 

0.33 

Grade 

17 

17 

17 

1.00 

’Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal 


bAFQT  »  sum  ol  subtest  standard  scores 

'Electronics  Composite  «  sum  ol  subtest  standard  scores  lor  Electronics  Composite 

'Factor  ■  score  from  first  (actor  Irom  principal  component  analysis 

'Grade  ■  Imal  course  grade 

'Adjusted  -  correlation  adjusted  lor  restriction  ol  range 

»p<  05 
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Table  111.9:  Intercorrelation  of  Study 
Variables:  Air  Force,  45530B* 


Appendix  HI 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Category 

Electronics 
AFOT6  Composite' 

Factor11 

Grade* 

Raw  Adjusted' 

Total 

AFQT 

1.00 

0.709 

0.729 

0.229 

0369 

Electronics  Composite 

231 

1.00 

0.839 

027= 

028= 

Factor 

231 

231 

1.00 

0.299 

Grade 

231 

231 

231 

1.00 

Male 

AFQT 

1.00 

0.719 

0.729 

0  23= 

0.379 

Electronics  Composite 

215 

1.00 

0.849 

0259 

027= 

Factor 

215 

215 

1.00 

0.299 

Grade 

215 

215 

215 

1.00 

Female 

AFQT 

1.00 

0819 

0.839 

0.15 

026 

Electronics  Composite 

16  ,. 

1.00 

0.719 

0.25 

0.26 

Factor 

16 

16 

1.00 

0.10 

Grade 

16 

16 

16 

1.00 

Wiiite 

AFQT 

1.00 

0.709 

0.729 

0.25= 

0.40= 

Electronics  Composite 

206 

1.00 

0.819 

0.32= 

0.34= 

Factor 

206 

206 

1.00 

0.359 

Grade 

206 

206 

206 

1.00 

Nonwtute 

AFQT 

1.00 

0.66= 

0.65= 

0.11 

0.19 

Electronics  Composite 

25 

1.00 

0.909 

0.05 

0.06 

Factor 

25 

25 

1.00 

0.04 

Grade 

25 

25 

25 

100 

‘Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


"AFQT  “  sum  cf  sublesl  standard  scores 

‘Electronics  Composite  «  sum  of  sublest  standard  scores  for  Electronics  Composite 

‘Factor  ■  score  from  fust  factor  from  principal  component  analysis 

‘Grade  =  final  course  grade 

'Adjusted  “  correlation  adjusted  for  restriction  of  range 

«p<  05 


Page  74 


GAO/PEMD-91-4  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  in 

Intercorrelation  of  Study  Variables  by 
Occupational  Specialty 


Table  IIKI 0:. Intercorrelation  of  Study;.  * 

Variables:  Air”  Force,  30332*  Electronics  Grade* 


Category 

AFQT* 

Composite* 

Factor' 

Raw 

Adjusted' 

Total 

AFQT 

1.00 

0.693 

0.759 

0.399 

0.599 

Electronics  Composite 

212 

1.00 

0.819 

0.419 

0.439 

Factor 

212 

212 

1.00 

0.439 

Grade 

212 

212 

212 

100 

Male 

AFQT 

1.00 

0.749 

0.789 

0.419 

0.619 

Electronics  Composite 

186 

1.00 

0.829 

0.409 

0429 

Factor 

186 

186 

1.00 

0459 

Grade 

186 

186 

186 

1.00 

Female 

AFOT 

1.00 

0.629 

0.719 

0.34 

053 

Electronics  Composite 

26 

1.00 

0.799 

0.489 

0.509 

Factor 

26 

26 

1.00 

0.31 

Grade 

26 

26 

26 

1.00 

White 

AFQT 

1.00 

0.709 

0.779 

0.369 

0559 

Electronics  Composite 

190 

1.00 

0.819 

0.419 

0439 

Factor 

190 

190 

1.00 

0.429 

Grade 

190 

190 

190 

1.00 

Nonwhite 

AFQT 

1.00 

0.569 

0.709 

0.629 

0819 

Electronics  Composite 

22 

1.00 

0.759 

0.439 

0.469 

Factor 

22 

22 

1.00 

0619 

Grade 

22 

22 

22 

1.00 

’Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


°AFQT  =  sum  of  subtest  standard  scores 

'Electronics  Composite  =  sum  of  subtest  standard  scores  for  Electronics  Composite 

'Factor  ■  score  from  first  factor  from  principal  component  analysis 

'Grade  »  final  course  grade 

'Adjusted  «  correlation  'djusted  for  restriction  of  range 

Sp  <  .05 
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Table  111.11:  Intercorrelation  of  Study 
Variables:  Air  Force,  30333* 


Electronics  Grade* 


Category 

AFQT" 

Composite' 

Factor11 

Raw 

Adjusted' 

Total 

AFQT 

1.00 

0.729 

0.779 

0329 

0509 

Electronics  Composite 

360 

1.00 

0.839 

0389 

0409 

Factor 

360 

360 

1.00 

0.409 

Grade 

360 

360 

360 

1.00 

Male 

AFQT 

1.00 

0.759 

0.799 

0319 

0.49= 

Electronics  Composite 

324 

1.00 

0.849 

0399 

041= 

Factor 

324 

324 

1.00 

0.349 

Grade 

324 

324 

324 

1.00 

Female 

AFQT 

1.00 

0.589 

0.789 

0.509 

0.70= 

Electronics  Composite 

36 

1.00 

0.749 

0.22 

0.24 

Factor 

36 

36 

1.00 

0.369 

Grade 

36 

36 

36 

1.00 

White 

AFQT 

1.00 

0.719 

0.779 

0.349 

0.539 

Electronics  Composite 

327 

1.00 

0.849 

0389 

0.40= 

Factor 

327 

327 

1.00 

0.359 

Grade 

327 

327 

327 

1.00 

Nonwhite 

AFQT 

1.00 

0.669 

0.689 

0.10 

0.17 

Electronics  Composite 

33 

1.00 

0.709 

0.439 

0.46= 

Factor 

33 

33 

1.00 

0.439 

Grade 

33 

33 

33 

1,00 

'Correlation  coefficients  are  in  upper  diagonal  and  number  in  lower  diagonal. 


°AFQT  »  sum  ol  sublesi  standard  scores 

'Electronics  Composite  -  sum  ol  subtesl  standard  scores  for  Electronics  Composite 

'Factor  ■  score  from  first  factor  from  principal  component  analysis 

'Grade  « final  course  grade 

'Adjusted  “  correlation  adjusted  for  restriction  of  range 

»p  <  .05 
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Specialty 

Year 

Number 

Mean 

24J 

1985 

154 

86.48 

1986 

152 

87,11 

1987 

102 

82.50 

1988 

92 

83.05 

Total 

500 

85.23 

27N 

1985 

196 

85.53 

1986 

157 

88.36 

1987 

145 

86.66 

1988 

185 

79.56 

Total 

683 

84.81 

26V/29V 

1985 

1,308 

82.28 

1986 

1,261 

79.39 

1987 

944 

80.19 

1988 

831 

78.77 

Total 

4,344 

80.40 
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FORCE  MANAGEMENT 
AND  PERSONNEL 


ASSISTANT  SECRETARY  OF  DEFENSE 

WASHINGTON,  D  C,  20301*4000 


1  0  AUG  i9S0 


Ms.  Eleanor  Cheiimsky 
Assistant  Comptroller  General 

Program  Evaluation  and  Methodology  Division 
U.S.  General  Accounting  Office 
441  G.  Street,  NW 
Washington,  DC  20548 

Dear  Ms.  Cheiimsky: 

This  is  the  Department  of  Defense  (DoD)  response  to  the 
General  Accounting  Office  (GAO)  draft  report,  "MILITARY  TRAINING: 
Effectiveness  for  Technical  Specialties  Inadequately  Measured," 
dated  May  31,  1990  (GAO  Code  973276,  OSD  Case  8371) . 

The  report  provides  a  series  of  useful  recommendations  that 
are  consistent  with  ongoing  DoD  initiatives  designed  to  develop 
more  sensitive  indicators  of  trainee  performance  and  to  develop 
more  cost-effective  ways  of  measuring  performance  both  in  the 
schoolhouse  and  on-the-job.  Despite  general  agreement  with  the 
report's  final  recommendations,  the  DoD  does  not  fully  concur 
with  many  of  the  specific  findings.  In  several  cases,  the  find¬ 
ings  and  conclusions  appear  to  be  based  on  incorrect  assumptions 
or  inappropriate  methodology.  Specific  issues  and  details  are 
provided  in  the  enclosure. 

In  addition,  it  is  important  to  note  that  the  field  of  job 
performance  measurement  is  still  a  developing  science  and  cost- 
effective  measures  for  use  in  evaluating  training  effectiveness 
are  not  yet  available.  As  discussed  in  the  enclosure,  the  DoD 
has  additional  measurement  programs  in  place  beyond  those  dis¬ 
cussed  in  the  report,  and  continues  to  support  a  substantial 
number  of  research  efforts  to  expand  the  boundaries  of  this 
science.  The  GAO  report  substantiates  the  Department's  conclu¬ 
sions  about  the  demands  of  selecting  and  training  individuals  to 
meet  the  requirements  of  technical  specialties  in  the  coming 
years,  and  reinforces  current  DoD  efforts  in  this  area. 

The  DoD  appreciates  the  opportunity  to  comment  on  the  draft 
report. 


Sincerely, 


Enclosure: 
As  stated 
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DoD  Response:  Concur.  While  the  statements  attributed  to  the 

Services  are  essentially  correct,  they  do  not  provide  the  "big 
picture."  Since  FY  1984,  quality  in  the  Air  Force  has  remained 
stable  at  98  to  99  percent  high  school  diploma  graduates  and 
98  to  100  percent  individuals  who  score  average  or  above  on  the 
enlistment  test.  Simultaneously,  Air  Force  recruiting  objectives 
have  fallen  from  60,000  in  FY  1984  to  43,000  in  FY  1989,  making 
it  easier  to  meet  its  goals  with  high  quality.  Although  the  Navy 
Delayed  Entry  Program  pool  eroded  in  FY  1989,  it  is  back  on 
target.  And  while  the  Army  did  not  achieve  it's  first  quarter 
FY  1989  recruiting  objective  (enlisting  all  but  47S  of  the  24,143 
people  it  sought),  it  finished  FY  1989  exceeding  the  objective. 

In  addition,  the  impact  of  the  mid-1990s  dip  in  the  size  of  the 
youth  population  will  be  moderated  by  reductions  in  accession 
requirements  that  are  likely  to  be  part  of  the  overall  down 
sizing  of  the  military  during  this  decade. 

The  GAO  report  also  mentions  that  American  youth  are  falling 
behind  youth  of  competitor  nations  in  "technological  literacy." 
While  unaware  of  the  existence  of  international  "technological 
literacy"  data,  it  is  the  DoD  objective  to  enlist  those  youth  who 
can  acquire  the  skills  to  field  sophisticated  weapon  systems.  To 
that  end,  the  education  of  the  nation's  youth  is  of  paramount 
importance  to  the  DoD.  Given  students'  lackluster  performance  on 
both  national  and  international  tests  over  the  last  decade,  the 
DoD  has  formed  a  collaborative,  working  arrangement  with  the  U.S. 
Department  of  Education,  whereby  the  Department  is  assisting  them 
with  development  and  fielding  of  new  international  literacy 
tests.  The  DoD  is  also  experimenting  with  those  same  tests  with 
hopes  of  improving  the  Joint-Service  enlistment  test.  The 
Department  shares  the  GAO  concern  and  hopes  to  have  much- 
improved,  international  comparative  literacy  data  over  the  next 
several  years. 

FINDING  B:  The  Quality  of  Hilltarv  Recruits— 1981-1989  Test 
Results.  The  GAO  reported  that  the  Armed  Services  Vocational 
Aptitude  Battery  is  comprised  of  ten  subtests  measuring  abilities 
considered  important  for  Military  Service.  The  GAO  also  reported 
that  all  the  Services  use  the  same  component  subtests  for  two 
composite  scores;  the  Electronics  composite  and  the  Armed  Forces 
Qualification  Test,  which  is  the  primary  mental  criteria  for 
entry  into  the  Armed  Forces.  The  GAO  found  the  following  regard¬ 
ing  Armed  Forces  Qualification  Test: 

-  overall  scores  improved  about  4  percent  between  1981 
and  1989; 

-  male  recruit  scores  began  and  ended  the  decade 
slightly  higher  than  female  scores; 
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scores  differed  more  substantially  across  racial 
groupings  than  between  genders; 

white  recruits  scores  began  the  decade  10  percent 
higher  than  minority  scores  and  ended  7  percent 
higher; 

- 

mean  scores  for  all  Services  were  significantly  higher 
in  1989  than  1981; 

— 

Army  scores  began  the  decade  substantially  below  those 
of  the  other  Services,  but  by  1986,  had  reached  the 
same  level  as  Navy  and  Marine  Corps  recruits;  and 

average  Air  Force  scores  have  consistently  been  higher 
than  the  other  Services  and  have  not  displayed  their 
tendency  to  plateau  at  mid-decade  levels. 

The  GAO  found  the  following  regarding  the  Electronics  Composite: 

- 

mean  scores  rose  2  percent  between  1981  and  1989; 

“ 

scores  peaked  in  1984  and  have  shown  a  gradual  decline 
since  then; 

female  recruits  scored  approximately  5  percent  lower 
than  male  recruits  during  the  eighties; 

“ 

white  recruits  scored  about  11  percent  higher  than 
minorities  in  1981  and  9  percent  higher  by  1989; 

the  narrowing  of  the  gap  for  minorities,  however,  was 
achieved  in  the  first  half  of  the  decade — by  1989, 
scores  for  all  racial  groups  were  declining; 

the  interservice  pattern  of  scores  mirror  those  of  the 

Armed  Forces  Qualification  Test,  with  the  Army  making 
up  a  10  point  difference  with  the  Navy  and  Marines  by 

1986,  and  the  Air  Force  on  top  throughout;  and 

mean  scores  for  the  three  Services  changed  very  little 
from  1985  to  1988,  but  Army  and  Navy  scores  declined 
significantly  in  1989.  (pp.  2-1  to  2-7/GAO  Draft 

Report) 

DoD.  Response:  Partially  concur.  Althouah  the  individual  calcu- 

lations  have  not  been  corroborated  by  the  DoD  due  to  time  con¬ 
straints,  trends  reported  in  the  Armed  Forces  Qualification  Test 
score  data  presented  for  comparison  of  groups  (i.e.,  gender, 
race/ethnicity,  and  Service)  look  reasonable,  as  do  the  trends 
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reported  regarding  the  Electronics  Composite.  Some  technical 
questions  suggest,  however,  that  clarification  may  be  necessary 
in  the  GAO  narrative. 

For  example,  the  GAO  report  states  that  Armed  Forces  Qualifica¬ 
tion  Test  "scores  improved  about  4  percent  between  1981  and 
1989."  In  other  statements,  various  percentage  changes  are 
mentioned  for  the  Armed  Forces  Qualification  Test  and  the  Elec¬ 
tronics  Composite.  Computing  percentage  gains  or  changes  in 
subtest  standard  scores  is  not  statistically  appropriate.  Scores 
on  the  Armed  Services  Vocational  Aptitude  Battery,  of  which  the 
Armed  Forces  Qualification  Test  and  the  Composite  scores  are  a 
part,  do  not  have  a  meaningful  zero  point  and,  therefore,  per¬ 
centage  changes  cannot  be  interpreted.  Computation  of  percent¬ 
ages  requires  a  ratio  scale,  which  is  more  powerful  than  the 
score  scale  for  all  aptitude  tests,  including  the  Armed  Services 
Vocational  Aptitude  Battery.  The  same  limitation  applies  to 
interpreting  changes  on  the  Electronics  Composite. 

Some  factors  related  to  changes  in  how  scores  have  been  computed 
are  relevant,  particularly  since  the  report  examines  scores 
across  several  years.  Between  1981  and  1989,  there  were  several 
changes  in  the  Armed  Forces  Qualification  Test  (e.g.,  the  sub¬ 
tests  used  to  compute  the  Armed  Forces  Qualification  Test  score 
were  changed  and  the  reference  population  for  norming  of  the  test 
was  updated) .  It  is  unclear  if  the  differences  in  how  scores 
were  computed  over  the  years  were  taken  into  account  in  the 
analyses  presented  in  Appendix  1  and  Figures  1,  2,  and  3;  clari¬ 
fication  as  to  these  differences  appears  appropriate,  otherwise 
comparisons  of  means  will  not  be  interpretable.  The  same  sort  of 
changes  occurred  over  the  years  in  the  calculation  of  the  Elec¬ 
tronics  Composite  and  would  affect  interpretation  of  Figures  5, 

6,  and  7. 

Finally,  with  the  large  sample  sizes  achieved  in  the  data  analy¬ 
ses,  statistical  significance  can  be  observed  for  differences 
that  have  relatively  little  practical  significance.  For  example, 
while  the  statement  that  "...  Navy  scores  declined  signifi¬ 
cantly  in  1989  (relative  to  1988)"  is  true,  the  drop  was  from  a 
score  of  211.58  in  1988  to  a  score  of  210.40  in  1989.  That  small 
a  drop  from  one  year  to  the  next  would  be  worth  noting,  yet  not 
cause  for  alarm. 

riNDING  C:  The  Quality  of  Military  Recruits— Number  of  Recruits 
Qualified  for  High. Technology  Specleltlei  Purina  the  Period 
1981-1989.  The  GAO  reported  that  ,  as  another  measure  of  recruit 
qualification  trends,  it  enumerated  the  number  of  recruits  whose 
Armed  Services  Vocational  Aptitude  Battery  scores  met  minimum 
standards  required  for  entry  into  two  selected  high  technology 
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military  specialties:  (1)  air  traffic  controllers  and  (2)  sys¬ 
tems  repair  technicians.  The  GAO  found  the  following  for  the  air 
traffic  controller  specialty: 

-  in  1981,  approximately  38,000  recruits  qualified  for 
the  specialty  and  by  1986,  more  than  69,000  recruits 
qualified — but,  since  then,  the  number  qualifying  has 
declined  to  58,  000; 

-  in  1981,  87  percent  of  the  qualifying  recruits  were 
white  males,  while  two-thirds  of  all  recruits  were 
white  males; 

-  by  1989,  84  percent  of  the  qualifying  recruits  wore 
white  males,  while  only  61  percent  of  the  recruits 
were  white  males 

-  while  one  third  of  the  white  males  entering  the  Ser¬ 
vice  qualified  on  the  basis  of  their  Electronics 
scores,  fewer  than  15  percent  of  the  white  females  so 
qualified  and  fewer  than  10  percent  of  the  minority 
males  and  3  percent  of  the  minority  females  qualified 
on  the  basis  of  their  Electronics  scores. 

The  GAO  found  the  following  for  the  Systems  Repair  Technician: 

-  in  1981,  the  number  of  qualified  recruits  for  the 
System  Repair  Technician  specialty  numbered  16,563 
and,  by  1983,  the  number  had  increased  sharply — but 
by  1989,  it  had  fallen  back  to  within  700  of  the  1981 
level;  and 

-  the  vast  majority  of  those  qualified  were  white 
males,  of  whom  11  percent  qualified  compared  with 
less  than  2  percent  for  other  demographic  groups. 

The  GAO  concluded  that,  based  on  its  review,  recruit  quality 
trends  during  the  eighties  are  not  reassuring.  The  GAO  also 
observed  that  fewer  recruits  are  qualifying  for  the  more  demand¬ 
ing  technical  occupational  specialties.  The  GAO  further  con¬ 
cluded  that,  with  women  and  minorities  forming  the  bulk  of  the 
new  entry  labor  force  by  the  year  2000,  providing  well-trained 
personnel  for  a  technologically  sophisticated  military  can  be 
expected  to  become  increasingly  difficult.  The  GAO  also  noted 
that,  in  turn,  the  burden  on  training  will  increase,  along  with 
the  need  to  monitor  its  effectiveness,  (pp.  2-7  to  2-11/GAO 
Draft  Report) 

DoD  Rasoonsa:  Partially  concur.  Providing  well-trained  person¬ 
nel  will  become  increasingly  difficult  should  recruit  quality 
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diminish.  However,  the  DoD  does  not  consider  that  recruit  qual¬ 
ity  trends  during  the  eighties,  particularly  the  mid-to-late 
1980s,  are  troublesome.  During  the  last  half  of  the  decade, 
recruit  quality  has  never  been  better.  Compared  to  the  youth 
population  from  which  the  DoD  recruits,  the  quality  level  has 
consistently  beep  well  above  average.  For  example,  in  FY  1989, 

92  percent  of  new  recruits  had  a  high  school  diploma,  in  contrast 
to  74  percent  in  the  youth  population.  Also,  in  FY  1989,  94 
percent  of  new  recruits  scored  average  or  above  on  the  enlistment 
test,  compared  to  69  percent  in  the  youth  population. 

Although  it  is  reasonable  that  the  GAO  would  want  to  assess  how 
the  aptitude  of  recruits  for  technologically  sophisticated  spe¬ 
cialties  has  changed  since  1980,  the  methodology  selected  to  do 
so  is  flawed.  Equating  a  decline  on  the  Armed  Services  voca¬ 
tional  Aptitude  Battery's  electronics  composite  to  a  decline  in 
recruits'  "technological  sophistication"  is  inappropriate.  The 
electronics  composite  is  composed  of  four  subtests  that  measure 
mathematics  ability  (arithmetic  reasoning  and  mathematics  knowl¬ 
edge),  general  science,  and  electronics  information.  As  the 
report  Figure  8  indicates,  the  decline  in  performance  on  the 
composite  is  driven  primarily  by  the  decline  in  performance  on 
one  subtest — electronics  information. 

There  is  also  a  flaw  in  the  example  used  by  the  GAO  beginning  on 
page  2-8,  wherein  the  report  refers  to  the  Air  Traffic  Control 
specialty  as  having  a  minimum  entry  standard  as  of  May  1989  of 
230  on  the  Electronics  composite  (in  standard  score  form) .  Air 
Traffic  Control,  Air  Force  Specialty  Code  272X0,  is  selected  on 
the  General  Composite  and  has  never  had  an  Electronics  require¬ 
ment.  That  renders  report  Figure  9  incorrect,  if  based  on  the 
composite  described  in  the  text.  The  GAO  may  have  actually 
performed  its  analyses  on  the  specialty  titled  Aircraft  Control 
and  Warning  Radar  Specialist,  Air  Force  Specialty  Code  303X2;  in 
report  Table  3.7,  that  specialty  is  correctly  reflected  as  having 
an  Electronics  Composite  qualifying  score  of  230. 

The  other  specialty  used  by  the  GAO  in  this  finding  is  Systems 
Repair  Technician,  an  occupation  so  specialized  that  it  is  not 
assigned  an  Air  Force  Specialty  Code,  but  is  identified  by  a 
Reporting  Identifier  (99104) .  It  would  be  appropriate  for  the 
report  to  mention  that  individuals  qualifying  for  this  specialty 
are  not  qualified  for  a  "typical"  high-technology  job,  but  are  at 
the  very  highest  end  of  the  technical  continuum.  A  footnote 
identifying  the  specialty  and  its  cutoff  score  requirement  would 
be  appropriate,  similar  to  the  footnote  given  at  the  bottom  of 
page  2-8  for  the  other  specialty. 

It  is  speculated  that  the  test  score  decline  on  the  electronics 
information  subtest  is  attributable  to  nationwide  educational 
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curriculum  changes.  Over  the  course  of  this  decade,  dramatic 
changes  have  occurred  in  public  and  private  elementary  and  sec¬ 
ondary  education  programs.  These  reforms  have  been  well  publi¬ 
cized  and  documented.  As  high  school  graduation  standards  have 
become  more  stringent,  students  have  had  fewer  opportunities  to 
take  elective  coursework.  Consequently,  enrollment  in  vocational 
education  courses,  like  electronics/electricity,  has  declined 
dramatically.  Througnout  the  1980s,  recruit  quality,  as  measured 
on  the  Armed  Services  Vocational  Aptitude  Battery's  Armed  Forces 
Qualification  Test  composite,  has  improved.  However,  as  the  GAO 
pointed  out,  performance  on  the  electronics  subtest/composite  has 
declined.  Again,  this  is  considered  to  be  an  artifact  of  the. 
educational  reform  movement.  Students  simply  are  no  longer 
enrolling  in  the  technical  and  trade  vocational  classes  where 
they  can  learn  basic  electronics/electrical  constructs. 

The  electronics  composite  is  a  valid  predictor  of  success  in 
training  and  on  the  job  for  occupational  specialties  requiring 
electronics/electrical  knowledge.  Given  that  it  is  also  known 
that  youth  are  taking  fewer  formal  courses  in  this  area  prior  to 
entry  into  the  military,  the  DoD  is  interested  in  improving  its 
ability  to  select  and  classify  recruits  into  electronics-related 
occupations.  To  that  end,  there  is  research  in  progress  to 
improve  the  content  of  the  current  enlistment  test.  A  number  of 
large-scale  research  projects,  on  both  new  paper-and-pencil  and 
computerized  tests,  are  underway  in  hopes  of  finding  better 
predictors  of  performance  in  military  training  and  occupations. 

The  Department  reiterates,  however,  that  it  is  inappropriate  to 
equate  performance  on  the  electronics  composite  with  recruits' 
overall  "technological  sophistication"  and  to  conclude  that  this 
sophistication  has  declined  over  the  decade  of  Che  1980s.  Unfor¬ 
tunately,  there  is  no  way  to  conduct  a  historical  study  on  this 
subject.  The  DoD  concurs  with  GAO  researchers  that  the  youth  and 
entry-level  labor  force  demographics  are  changing  and  that  the 
Department  needs  to  study  carefully  the  effects  of  its  enlistment 
test  and  concomitant  composites  on  the  people  (e.g.,  women, 
minorities)  that  will  be  recruited  in  the  future.  To  that  end, 
the  results  from  enlistment  test  research  described  above  are 
expected  to  be  helpful  in  making  future  enlistment  test  deci¬ 
sions. 

riNDING  D:  Schoolhouaa  Measures  of  Training  Effectiveness — Amy. 
The  GAO  reviewed  course  grades  in  Army  advanced  individual  train¬ 
ing  courses  for  five  occupational  specialties  to  determine  the 
extent  to  which  appropriate  data  were  available  to  the  Military 
Services  for  use  in  judging  training  effectiveness.  The  GAO 
found  that  the  course  grades  for  the  five  specialties  were  not 
equally  reliable  indicators  of  performance  during  training.  The 
GAO  noted,  for  instance,  chat  at  Fort  Gordon  it  was  unable  to 
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find  a  consistent  relationship  between  milestone  measures  and 
final  grades,  nor  was  it  able  to  locate  anyone  who  could  suggest 
a  relationship.  The  GAO  concluded  that  the  grades  recorded  for 
two  of  the  courses  (36E  and  39B)  could  not  be  used  to  discrimi¬ 
nate  reliably  among  the  performance  of  individual  trainees.  The 
GAO  found  inconsistencies  in  scoring  between  different  classes 
and  even  within  the  same  class.  The  GAO  also  found  that  Fort 
Gordon.'.s  grades  (unlike  Redstone's  grades)  were  based  partially 
on  measures  of  physical  conditioning  that  appeared  to  be  unre¬ 
lated  to  job  performance.  The  GAO  concluded  that  the  psychomet¬ 
ric  differences  it  found  at  Fort  Gordon  appeared  to  be  the  result 
of  a  number  of  factors  including  (1)  questionable  data  entry 
procedures  and  software  and  (2)  the  pass/fail  nature  of  the 
criteria  used  to  evaluate  student  progress.  GAO  suggested  that 
subject  matter  experts  need  to  develop  more  finely  tuned,  objec¬ 
tive,  and  reliable  measures  of  performance  than  "go/no-go."  The 
GAO  noted  that,  because  of  the  problems  encountered  at  Fort 
Gordon,  it  excluded  those  courses  from  its  sample  of  Army  train¬ 
ees,  resulting  in  the  inclusion  of  all  recruits  who  completed  24J 
and  27N  training  between  October  1987  and  July  1989,  and  approxi¬ 
mately  one-third  of  those  who  completed  29V  training  during  the 
same  period. 

The  GAO  found  that,  on  the  Armed  Forces  Qualification  Test  and 
the  Electronic  Composite,  male  trainees  scored  significantly 
higher  than  did  females  and  white  trainees  performed  better  than 
minority  students.  The  GAO  further  found  that  the  training 
performance  differences  correspond  with  the  test  score  differ¬ 
ences  on  both  tests  for  the  racial  groupings.  The  GAO  noted  that 
for  gender,  training  performance  differences  between  males  and 
females  were  larger  than  test  score  differences.  The  GAO  also 
found  that  the  Electronics  Composite  is  a  better  predictor  of 
success  than  the  Armed  Forces  Qualification  Test. 

The  GAO  further  found  that,  for  its  entire  sample,  the  score  on 
the  Electronics  Composite  explains  18  percent  of  the  variation  in 
course  grades,  more  than  the  Armed  Forces  Qualification  Test — and 
a  GAO-developed  "factor  score,"  which  is  the  weighted  sum  of  all 
Armed  Services  Vocational  Aptitude  Battery  subtests.  The  GAO 
concluded  that,  for  males,  the  Electronic  Composite  score  appears 
to  be  a  better  predictor  of  future  performance  than  the  Armed 
Forces  Qualification  Test.  The  GAO  found,  however,  that  for 
females,  the  Armed  Services  Vocational  Aptitude  Battery  "factor 
scores"  are  better  predictors  of  schoolhouse  performance  than  the 
Armed  Forces  Qualification  Test,  which  is  a  better  predictor  chan 
the  electronics  composites.  The  GAO  noted  that  for  minority 
soldiers,  the  ability  to  predict  training  course  grades  based  on 
test  scores  is  the  weakest  of  all  groups.  The  GAO  concluded  that 
the  Armed  Forces  Qualification  Test,  or  some  other  general  score 
form  the  Armed  Services  Vocational  Aptitude  Battery,  may  provide 
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a  better  predictor  of  success  for  women  recruits  in  electronics- 
related  training  than  does  the  Electronics  score.  The  GAO  fur¬ 
ther  concluded  that  better  predictors  of  training  performance  are 
needed  for  minority  students,  (pp.  3-1  to  3-7/GAO  Draft  Report) 

6oP  Response:  Partially  concur.  The  Army's  testing  procedures 
for  'soldiers  undergoing  Advanced  Individual  Training  are  designed 
to  ensure  that  soldiers  achieve  specified  training  objectives. 

To  accomplish  this,  criterion-referenced  hands-on  performance 
■tests  are  administered  and  scored  on  a  "go/no-go"  basis.  Such 
tests  are  routinely  used  in  the  military  to  evaluate  training 
effectiveness  because  they  provide  meaningful  information  to 
course  managers  on  student  performance,  as  well  as  information  on 
the  degree  to  which  the  course  is  meeting  its  stated  objectives. 
Given  that  such  tests  are  not  designed  to  measure  the  relative 
performance  of  individuals  (i.e.,  these  measures  are  not  norm- 
referenced),  it  is  neither  surprising  nor  particularly  disturbing 
that  the'  GAO  found  such  test  results  unsuitable  for  correlational 
analysis.  Criterion-referenced  measurement,  such  as  the 
^go/no-go"  measures  used  by  the  Army,  are  a  psychometrically 
sound  method  when  mastery  learning  is  the  goal  of  instruction  as 
is  the  case  under  discussion. 

As  with  other  findings  in  the  report  that  describe  trends  in  the 
Armed  Forces  Qualification  Test  scores  and  examine  differences 
for  groups  (o.g.,  gender  and  race/ethnicity),  the  statements 
about  training  performance  differences  appear  reasonable.  How¬ 
ever,  there  are  problems  with  some  of  the  specific  analyses  the 
GAO  indicates  were  performed  to  reach  those  conclusions.  For 
example,  in  the  Army  sample,  students  from  three  courses  were 
pooled  to  increase  the  sample  size  and  the  course  grades  for  the 
various  specialties  were  assumed  to  be  on  the  same  score  scale, 
or  to  have  the  same  meaning.  In  fact,  course  grades  tend  to  be 
on  course-unique  metrics  and  there  is  no  way  to  evaluate  whether 
a  score  of,  say,  90  in  one  course  means  the  same  in  terms  of 
competence  as  a  score  of  90  in  another  course.  Thus,  the  mean 
reported  as  an  average  of  grades  for  the  three  Army  courses  is 
not  meaningful  and  the  relationship  to  scores  from  the  Armed 
Services  Vocational  Aptitude  Battery  is  tenuous.  Note  that  for 
large  samples,  such  as  white  males,  the  differences  in  the  score 
scales  tend  to  average  out  and  the  correlation  coefficients  are 
reasonably  interpretable.  For  small  samples,  however,  the  dif¬ 
ferent  scales  for  course  grades  are  likely  to  distort  the  corre¬ 
lation  coefficients  and  means.  Since  the  same  analyses  of 
schoolhouse  measures  of  effectiveness  were  used  for  each  Service 
(Findings  D,  E,  and  F),  additional  comments  applicable  to  all 
appear  in  the  DoD  response  to  Finding  G,  the  summary  finding  on 
schoolhouse  measures. 
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The  report  also  states  that  the  Electronics  Composite  is  the 
weakest  predictor  and  the  Factor  score  is  the  strongest  for 
females.  However,  statistical  results  from  such  a  small  sample 
(76  females)  would  not  be  stable  enough  to  warrant  policy 
changes.  The  results  reported  by  the  GAO,  in  all  probability, 
would  not  be  replicated  given  a  larger  sample.  Also,  the 
adjusted  validity  coefficients  ter  range  restriction  in  report 
Table  3.6  show  for  the  Female  Factor  Score  composite  an  increase 
of  .42.  That  result  is  suspect,  as  normally  such  adjustments 
rarely  provide  an  increase  of  more  than  .20. 

It  should  also  be  noted  that  only  one  of  the  four  training 
courses  represented  is  even  open  to  women  (Aviation  Anti- 
Submarine  Warfare  Technician) ,  which  is  not  evident  without  close 
study  of  report  Table  3.6.  The  data  for  males  in  report  Table 
3.6  is  the  result  of  merging  four  training  courses  and  produces 
an  unorthodox  analysis  that  requires  an  explanation  of  grading 
differences  which  may  exist  for  the  different  schools. 

As  with  the  previous  finding,  trends  in  the  Armed  Forces  Qualifi¬ 
cation  Test  scores  and  the  Electronics  Composite  in  Navy  courses, 
including  differences  for  groups  (e.g.,  gender  and  race/ethnic- 
icy)  ,  appear  reasonable  with  respect  to  schoolhouse  measures  of 
training  effectiveness.  However,  the  problems  with  some  of  the 
specific  analyses  the  GAO  indicates  were  performed  to  reach  those 
conclusions  remain  a  factor.  In  the  Navy  sample,  students  from 
four  courses  were  pooled  to  increase  sample  size  and  the  assump¬ 
tion  that  course  grades  for  the  various  courses  have  the  same 
meaning  is  tenuous.  That  limits  the  confidence  in  interpretation 
of  the  relationship  to  scores  from  the  Armed  Services  Vocational 
Aptitude  Battery.  Note  that  for  large  samples,  such  as  white 
males,  the  differences  in  the  score  scales  tend  to  average  out, 
and  the  correlation  coefficients  are  reasonably  interpretable. 

For  small  samples,  however,  the  different  scales  for  course 
grades  are  likely  to  distort  the  correlation  coefficients  and 
means.  Additional  comments  applicable  to  all  appear  in  the  DoD 
response  to  Finding  G,  the  summary  finding  on  schoolhouse  mea¬ 
sures. 

FINDING  F :  Schoolhouse  Measures  of  Training  Effectiveness — Air 
Force .  The  GAO  reported  that  it  examined  four  Air  Force  cours¬ 
es — (1)  Aircraft  Control  and  Warning  Radar  Specialist,  (2)  Auto¬ 
matic  Tracking  Radar  Specialist,  (3)  Photo-Sensors  Maintenance 
Specialist,  Tactical  Reconnaissance  Sensors,  and  (4)  Photo-Sen¬ 
sors  Maintenance  Specialist,  Reconnaissance  Electro-Optical 
Sensors.  The  GAO  found  that,  like  the  Navy,  (1)  "factor  scores" 
are  as  good  or  better  predictors  than  composites,  (2)  for  the 
female  students,  the  Armed  Forces  Qualifications  Test  scores  and 
factor  scores  out  predict  Electronic  scores,  and  (3)  it  is  most 
difficult  to  predict  course  grades  for  minority  students, 


Page  89 


GAO/PEMD91-4  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  V 

Comments  From  the  Department  of  Defense 


12 


although  factor  scores  explained  10  percent  (46  percent  after 
adjustment) .  The  GAO  concluded  that  because  of  problems  with 
some  Army  data,  and  the  special  preparation  of  data  by  the  Navy 
and  Air  Force,  it  would  not  be  appropriate  to  make  inter-Service 
comparisons  or  make  firm  judge-  ments  about  the  immediate  avail¬ 
ability  of  psychometrically  suitable  measures  from  the  Navy  and 
the  Air  Force  (pp.  3-8  to  3-10/GAO  Draft  Report) . 

DoD  Response:  Partially  concur.  As  with  other  findings  in  the 
report,  which  describe  trends  in  the  Armed  Forces  Qualification 
Test  scores  and  examine  differences  for  groups  (i.e.,  gender  and 
race/ethnicity),  the  statements  about  training  performance  dif¬ 
ferences  appear  reasonable.  The  problems  with  some  of  the  analy¬ 
ses  the  GAO  indicates  were  performed  to  reach  those  conclusions 
restrict  interpretability  of  the  findings,  as  was  stated  in  the 
DoD  response  to  Findings  D  and  E.  Additional  comments  appear  in 
the  DoD  response  to  Finding  G,  the  summary  finding  on  schoolhouse 
measures.  The  DoD  does  concur,  however,  with  the  final  statement 
in  Finding  F,  which  indicates  it  would  not  be  appropriate  to  make 
inter-Service  comparisons.  In  addition,  research  performed  by 
the  Air  Force  Human  Resources  Laboratory  confirms  many  of  the  GAO 
findings  about  general  ability  (such  as  is  measured  in  the  Factor 
Scores  the  GAO  examined)  as  a  valuable  predictor  of  schoolhouse 
performance. 

FINDING  G:  Schoolhouse  Measures  of  Training  Effectiveness — Sum¬ 
mary,  The  GAO  questioned  the  differencial  success  in  training 
for  males  and  females  and  for  whites  and  minorities — and  about 
the  differential  predictive  validity  of  the  Armed  Services  Voca¬ 
tional  Aptitude  Battery  for  these  groups.  The  GAO  concluded  that 
its  analysis  of  gender  and  race-related  differences  in  mean  Armed 
Services  Vocational  Aptitude  Battery  scores  and  course  grades  in 
the  Army  suggest  that  the  Electronic  composite  is  an  efficient 
simple  predictor  of  training  success.  The  GAO  found,  however, 
that  in  the  Navy  and  Air  Force,  a  more  complex  relationship 
exists  between  the  Armed  Services  Vocational  Aptitude  Battery 
scores  and  course  grades.  The  GAO  noted  that  gender  and  race-re¬ 
lated  differences  in  course  grades  were  quite  small  compared  with 
significant  differences  in  Electronics  scores.  The  GAO  concluded 
that  an  advantage  in  more  general  aptitude,  measured  by  the  Armed 
Forces  Qualification  Test,  can  compensate  for  a  deficit  in  Elec¬ 
tronics — when  the  deficit  is  not  too  great. 

The  GAO  also  noted  that,  while  the  Armed  Services  Vocational 
Aptitude  Battery's  Electronics  composite  score  demonstrated  a 
moderate  ability  to  predict  training  success  for  white  students 
and  males,  it  was  less  successful  for  female  or  minority  stu¬ 
dents.  The  GAO  concluded  the  Factor  Score  that  it  derived  was, 
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in  most  cases,  the  best  predictor  of  training  success  because  it 
utilized  information  from  all  ten  Armed  Services  Vocational 
Aptitude  Battery  subtests. 

The  GAO  concluded  that,  based  on  its  work,  it  was  impossible  to 
determine  whether  the  Armed  Servicer  Vocational  Aptitude  Battery 
is  a  weaker  measure  of  ability  for  some  groups — or  if  some  other 
factor  in  schoolhouse  training  contributes  differentially  to  the 
success  of  the  different  groups.  The  GAO  noted  that  the  relative 
inconsistency  between  school  grades  and  nest  scores  exists  and 
should  be  addressed  by  both  the  recruiting  and  training  communi¬ 
ties.  The  GAO  further  concluded  that  it  will  become  increasingly 
incumbent  on  the  Services  (1)  to  optimize  selection  criteria  for 
advanced  individual  technical  training  for  women  and  minority 
groups,  (2)  to  provide  compensatory  training  where  needed,  and 
(3)  to  assure  that  no  extraneous  factors  within  the  training 
environment  interfere  with  the  full  achievement  potential,  (pp. 
3-10  to  3-13/GAO  Draft  Report) 

pod  Response:  Partially  concur.  With  respect  to  GAO  findings 
describing  trends  in  the  Armed  Forces  Qualification  Test  scores 
and  the  Electronics  Composite  and  examining  differences  for 
groups  (i.e.,  gender  and  race/ethnicity),  the  statements  about 
training  performance  differences  appear  reasonable.  The  analyses 
of  the  relationships  of  scores  from  the  Armed  Services  Vocational 
Aptitude  Battery  (Armed  Forces  Qualification  Test,  Electronics 
Composite,  and  Factor  Score)  and  school  grades  are  flawed  and, 
consequently,  interpretation  of  the  results  of  those  analyses  is 
doubtful.  Because  the  same  analytic  procedures  were  used  for  all 
Services  and  similar  conclusions  drawn,  the  following  comments 
pertain  to  Findings  D,  E,  F,  and  G  alike. 

Problems  with  the  analyses  arise  from  the  following  sources: 

-  pooling  students  from  several  courses,  when  the  grades 
for  different  courses  generally  are  not  comparable; 

-  correction  for  restriction  of  range  on  the  Factor 
Score,  which  resulted  in  correlation  coefficients  that 
are  not  plausible; 

-  lack  of  regression  analyses;  and 

-  small  sample  sizes  for  females. 

In  each  Service,  students  for  several  courses  were  pooled  to 
increase  sample  size  and  the  course  grades  for  the  various 
courses  within  each  Service  were  assumed  to  be  on  the  same  score 
scale,  or  to  have  the  same  meaning.  In  fact,  course  grades  are 
not  normally  interpretable  from  course-to-course,  because  of 


Page  91 


GAO/PEMD-914  Military  Technical-Training  Effectiveness  Is  Unknown 


Appendix  V 

Comments  From  the  Department  of  Defense 


between-course  differences  in  scales  and  the  level  of  competency 
inferred  by  a  particular  score.  There  is  no  way  to  evaluate 
whether  a  score  of,  say,  90  in  one  course  means  the  same  as  a 
score  of  90  in  another  course.  (For  the  Army,  three  courses  were 
combined,  four  courses  for  the  Navy,  and  four  for  the  Air  Force.) 
Thus,  the  mean  grades  reported  for  courses  in  each  Service  are 
somewhat  arbitrary  numbers  and  their  relationship  to  scores  from 
the  Armed  Services  Vocational  Aptitude  Battery  is  tenuous.  Note 
that  for  large  samples,  such  as  white  males,  the  differences  in 
the  score  scales  tend  to  average  out,  and  the  correlation  coeffi¬ 
cients  are  reasonably  interpretable.  For  small  samples,  however, 
the  different  scales  for  course  grades  are  likely  to  distort  the 
correlation  coefficients  and  means. 

The  correlation  coefficients  for  the  Factor  Scores  are  suspi¬ 
ciously  high,  especially  after  correction  for  restriction  of 
range.  The  Factor  Scores  are  based  on  the  first  principal  compo¬ 
nent  of  the  Armed  Services  Vocational  Aptitude  Battery  and  the 
weights  tend  to  be  uniform  (from  .10  to  .14).  The  Factor  Score 
is  the  sum  of  the  10  subtest  standard  scores  and  the  correlation 
coefficient  could  be  computed  using  the  correlation  of  sums.  An 
important  point  is  that  the  weights  are  not  regression  weights 
computed  to  maximize  the  correlation  between  the  aptitude  test 
scores  and  course  grades;  instead,  the  correlation  coefficient 
for  the  Factor  Score  is,  in  effect,  the  average  for  the  10  sub- 
tests. 

In  previous  studies,  the  four  subtests  in  the  Electronics  Compos¬ 
ite  (Math  Knowledge,  Arithmetic  Reasoning,  General  Science,  and 
Electronics  Information)  repeatedly  tend  to  have  the  highest 
correlation  with  course  grades  in  these  kinds  of  courses.  As  a 
rule,  therefore,  the  correlation  with  course  grades  should  be 
higher  for  the  Electronics  Composite  than  for  the  Factor  Score. 
Deviations  from  this  expectation  may  be  attributed  to  artifacts, 
such  as  restriction  of  range. 

The  GAO  report  recognizes  that  correlation  coefficients  in  sam¬ 
ples  cannot  be  compared  directly  because  of  range  restriction. 
Adjustments  are  made  to  compensate  for  differences  in  restriction 
of  range.  The  adjusted  values  for  the  Armed  Forces  Qualification 
Test  and  Electronics  Composite  are  plausible  in  that  they  are 
consistent  with  other  analyses;  the  adjusted  values  for  the 
Factor  Score,  however,  are  unduly  high  and  they  lack  plausibil¬ 
ity.  The  procedure  used  to  correct  for  restriction  of  range 
should  be  based  on  the  multivariate  model,  which  involves  complex 
formulae  and  computing  routines.  The  simpler  univariate  model 
may  have  been  used,  which  could  distort  the  adjusted  values  for 
the  Factor  Score. 
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Comparisons  are  made  by  gender  and  minority  status  based  on  mean 
scores  and  correlation  coefficients.  Conclusions  about  the 
appropriateness  of  the  Armed  Services  Vocational  Aptitude  Battery 
for  females  and  racial/ethnic  minorities  are  then  based  on  these 
comparisons.  Such  comparisons  are  a  good  place  to  start,  but 
analyses  of  gender  and  race  differences  should  include  a  compari¬ 
son  of  the  respective  regression  lines  (slopes  and  intercepts) , 
errors  of  estimate,  and  cutoff  scores.  Analyses  of  differences 
in  mean  performance  on  predictors,  final  school  grades,  and 
differences  in  validity  coefficients  are  not,  by  themselves, 
sufficient.  With  the  more  thorough  regression  analysis,  meaning¬ 
ful  conclusions  can  be  made  about  the  appropriateness  of  aptitude 
tests  for  female  and  racial/ethnic  minorities  compared  to  white 
males. 

Even  if  the  DoD  were  to  fully  concur  with  the  statistical  analy¬ 
ses  performed,  interpretation  of  the  results  for  females  would 
remain  problematic  because  of  the  small  sample  sizes.  The  number 
of  females  with  course  grades  in  the  samples  are  18  for  the  Army, 
71  for  the  Navy,  and  98  for  the  Air  Force,  with  such  sample 
sizes,  differences  in  scales  for  course  grades  may  be  exacer¬ 
bated;  correction  for  range  restriction  could  lead  to  illogical 
correlation  coefficients;  and  regression  equations  with  up  to  10 
predictor  variables  would  result  in  unduly  high  correlation. 
Issues  of  generalizing  to  other  samples  and  of  making  policy 
decisions  about  selecting  females  and  assigning  them  to  technical 
specialties  should  always  be  considered  extremely  carefully  and 
be  based  on  thorough  analysis.  Replication  of  results  is  the 
sine  Qua  non  of  analysis  and  an  adequate  sample  size  is  a  good 
foundation  for  replication.  The  conclusion  "chat  the  Services 
should  consider  developing  a  more  general  ASVAB  (sic)  derivative 
such  as  our  Factor  Score  to  assign  women  and  minorities  to  tech¬ 
nical  training"  (p.  5-2  and  3)  is  reasonable,  and  could  be  pur¬ 
sued  by  the  military  manpower  research  community.  The  report 
provides  a  stimulus  to  continue  efforts  to  improve  the  effective¬ 
ness  of  selecting  and  classifying  recruits,  especially  for  minor¬ 
ities. 

FINDING  H:  Field  Measures  of  Training  Effectiveness — Army.  The 
GAO  reported  that,  although  it  was  aware  of  numerous  post-train¬ 
ing  evaluation  activities  performed  by  the  individual  services, 
only  the  Army  could  provide  individual  performance  measures.  The 
GAO  reported  that,  by  Army  regulation,  a  soldier's  occupational 
specialty  performance  is  tested  within  6  months  of  completion  of 
training  and  every  year,  thereafter,  under  the  Skills  Qualifica¬ 
tion  Test  program.  The  GAO  found  the  following  regarding  the 
Skills  Qualification  Test  scores: 

-  the  best  predictor  of  Skill  Test  scores  are  final 
schoolhouse  grades; 
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-  the  Armed  Forces  Qualification  Test  and  Electronics 
scores  were  also  significantly  related  to  the  Skill 
Test  scores  for  whites  and  males,  but  factor  scores 
consistently  out  predicted  the  composites; 

-  for  females  and  non-white  soldiers,  the  Armed  services 
Vocational  Aptitude  Battery  scores  were  not  positively 
related  to  future  performance,  as  measured  by  Skill 
Qualification  Test  scores;  and 

-  the  grades  scored  by  females  at  the  schoolhouse  were 
inversely  correlated  with  the  Skill  Qualification  Test 
scores. 

The  GAO  concluded  that  the  traditional  Armed  Services  Vocational 
Aptitude  Battery  scores  may  not  be  the  best  predictor  of  perfor¬ 
mance  for  the  non-traditional  soldier — that  is,  the  female  or 
minority,  soldier.  The  GAO  observed  that  better  predictors  of 
success  for  these  groups  should  be  found,  (pp.  4-1  to  4-5/GAO 
Draft  Report) 

DoD  Response:  Partially  concur.  The  GAO  appears  to  have  incor¬ 
rectly  assumed  that  Skill  Qualification  Tests  have  a  common 
metric  across  different  specialties,  skill  levels,  and  years. 

Due  to  the  requirement  to  develop  new  tests  each  year,  individual 
tests  are  fielded  with  a  minimum  of  pretesting.  As  a  result, 
means  and  standard  deviations  across  a  specialty  and  even  across 
years  within  the  same  specialty  and  skill  level  may  vary  greatly. 
For  example,  in  the  five  specialties  studied  by  the  GAO,  the 
moans  on  the  individual  skill  level  1  test  during  1985-1989 
ranged  from  74.5  to  88.4,  while  standard  deviation  ranged  from 
3.5  to  14.7. 

During  the  years  1985-1989,  more  than  3800  different  tests  were 
administered  in  more  than  200  specialties  annually  across  skill 
levels  1  to  4.  The  Army  Research  Institute  is  currently  analyz¬ 
ing  this  data  (more  than  1  million  scores)  and  intends  to  report 
Armed  Services  Vocational  Aptitude  Battery  validities  by  both 
race  and  gender  as  well  as  for  sample  size  whenever  sample  size 
is  adequate  for  such  analyses.  Noting  the  GAO  concern  relating 
to  low  validity  for  blacks  and  females  in  their  study,  the  Army 
has  computed  validities  for  these  groups  for  the  1988  Skill 
Qualification  Tests.  For  71  skill  level  I  samples  comprised  of 
at  least  50  females,  the  median  corrected  validity  is  .58,  for 
samples  of  50  or  more  blacks  the  median  validity  is  .47;  the 
median  validity  for  205  total  samples  is  .57.  While  the  Army 
understands  the  GAO  focused  only  on  highly  technical  specialties, 
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total  accessions  in  the  five  GAO  selected  specialties  numbered 
only  310  compared  to  more  than  120,000  for  all  specialties  during 
1988. 

It  is  suspected  that  the  finding  is  affected  by  the  small  samples 
of  females  and  minorities  in  the  GAO  analyses.  The  finding  that 
Armed  Services  Vocational  Aptitude  Battery  scores-  were  not  posi¬ 
tively  related  to  Skill  Qualification  Test  scores  for  females  and 
non-white  soldiers  is  contrary  to  the  body  of  research  evidence 
for  predicting  training  grades  in  the  schoolhouse.  The  consis¬ 
tent  finding  in  all  Services  is  that  aptitude  scores  are  about 
equally  valid  for  females,  racial/ethnic  minorities,  and  white 
males,  although  there  may  be  some  over  or  underprediction  for 
females  and  minorities.  Research  results  also  show  that  aptitude 
tests  predict  supervisors'  ratings  of  job  performance  for  blacks 
about  as  well  as  for  whites.  The  results  presented  by  the  GAO 
should  be  evaluated  in  larger  samples. 

The  same  problems  noted  earlier  with  analysis  of  schoolhouse 
training  grades  apply  to  this  analysis  of  Skill  Qualification 
Test  scores: 

-  pooling  of  specialties — Skill  Qualification  Test 
scores  are  not  on  a  common  metric  across  specialties, 
and  che  same  numerical  value  in  different  tests  does 
not,  as  a  rule,  mean  the  same  level  of  competence; 

-  the  correction  for  restriction  of  range  on  the  Factor 
Score  leads  to  distortion  in  the  results; 

-  a  regression  analysis  is  appropriate  and  was  not  per¬ 
formed;  and 

-  the  sample  size  of  females  (18  or  21)  is  inadequate  to 
draw  meaningful  conclusions. 

Research  in  progress  pertaining  to  enlistment  test  development, 
including  computerized  tests,  will  examine  implications  for 
gender  and  minority  subgroups. 

FINDING  I:  Field  Measures  of  Training  Effectiveness — Navy.  The 
GAO  reported  chat  it  considered  two  possible  sources  of  field 
information  routinely  collected  by  the  Navy  as  measures  of  the 
effectiveness  of  the  training  courses — (1)  Level  II  surveys  and 
(2)  Advancement  in  Rating  Examinations.  The  GAO  found,  however, 
that  the  Level  II  surveys  have  been  effectively  abandoned  by  the 
Navy,  with  none  having  been  performed  since  at  least  1986.  The 
GAO  concurred  with  the  judgement  of  the  test  developers  and 
administrators  that,  because  the  test  is  not  standardized  and  is 
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not  administered  to  all  graduates,  the  Advancement  in  Rating 
Examination  is  "not  a  good  source  of  training  evaluation  feed¬ 
back  .  ” 

The  GAO  reported  that,  in  1986,  the  Chief  of  Maval  Operations 
requested  that  the  Naval  Training  Systems  Center  determine  the 
current  status  of  Navy  training  evaluation  and  provide  recommen¬ 
dations.  The  GAO  further  reported  that,  while  numerous  non-for- 
mal  or  non-centralized  activities  were  identified,  the  Naval 
Training  Systems  Center  found  that: 

-  the  quality  of  current  Navy  schoolhouse  training 
could  not  be  readily  ascertained  for  the  vast  major¬ 
ity  of  the  courses  being  offered; 

-  there  is  a  lack  of  technical  evaluation/assessment 
skills;  and 

-  current  evaluation  activities  are  fractionated,  not 
comprehensive,  and  operating  in  an  environment  of 
obsolete  instructions  and  unclear  objectives. 

The  GAO  reported  that  the  Navy  made  a  number  of  recommendations 
to  upgrade  and  take  a  systematic  approach  to  training  evaluation. 
According  to  the  GAO,  the  Navy  has  assigned  a  three-person  team 
to  review  the  proposals  and  recommend  an  integrated  training 
appraisal  program.  The  GAO  concluded  that,  while  the  Navy  should 
be  commended  for  its  willingness  to  acknowledge  past  evaluation 
deficiencies,  it  seriously  questioned  whether  this  response  is 
appropriate  to  the  severity  and  extensiveness  of  the  problems 
that  the  Naval  Training  Systems  Center  has  documented,  (pp.  4-5 
to  4-8/  GAO  Draft  Report) 

DoD  Response:  Partially  concur.  Level  II  surveys  were  discon¬ 
tinued  by  the  Navy  because  they  were  paper-intensive  and  placed 
an  undue  burden  on  the  fleet.  Moreover,  only  limited  methods  of 
evaluating  the  effectiveness  of  schoolhouse  training  were  in 
effect  at  the  time  the  Navy  requested  the  Naval  Training  Systems 
Center  to  determine  the  status  of  evaluation  procedures  and  make 
appropriate  recommendations.  Since  that  time,  however,  the  Navy 
has  successfully  employed  several  means  of  collecting  feedback  cn 
training  effectiveness.  In  addition  to  the  steps  being  taken  by 
the  Navy  to  enhance  training  evaluation  methods  as  reported  by 
the  GAO,  several  other  programs  are  underway .  These  include  the 
(1)  Navy  Training  Appraisal  Program,  (2)  Navy  Training  Require¬ 
ments  Review,  (3)  Fleet  Training  Appraisal  Program,  and  (4) 
Maintenance  Training  Improvement  Program.  These  are  discussed  in 
more  detail  in  the  following  paragraphs. 
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A  Navy  training  appraisal  program  was  implemented  in  March  1989. 
The  process  provides  the  Chief  of  Naval  Operations  with  an 
assessment  of  the  adequacy  of  Navy  training  to  support  warfight¬ 
ing  capabilities  in  each  of  the  Navy's  primary  mission  areas  and 
focuses  attention  on  specific  areas  where  training  may  be  defi¬ 
cient.  The  training  appraisal  program  allows  scarce  training 
assessment  resources  to  be  brought  to  bear  upon  chose  training 
programs  that  fleet  feedback  reveals  are  most  in  need  of  atten¬ 
tion.  The  Navy  training  appraisal  process  has  thus  far  examined 
acoustic  operator,  damage  contfol/firefighting,  electronic  war¬ 
fare  operator/maintainer,  and  "over-the-horizon”  targeting  sys¬ 
tems  training. 

There  is  also  an  ongoing  Navy  Training  Requirements  Review,  which 
provides  direct  feedback  between  warfare  sponsors,  Systems  Com¬ 
mands,  the  fleet,  and  the  Naval  Education  and  Training  Command  on 
a  scheduled  basis.  That  program  requires  fleet  experts  to  talk 
directly  to  school  personnel  and  provides  valuable  information  on 
training  effectiveness. 

Additional  training  effectiveness  feedback  systems  m  place 
include  the  Fleet  Training  Appraisal  Program  and  the  Maintenance 
Training  Improvement  Program  which  provide  fleet  performance 
data.  The  Training  Performance--Evaluation  Board  Training  Evalua¬ 
tion  and  Assessment  Division  was  staffed  in  February  of  1990  and 
has  as  part  of  its  charter  the  study  of  training  feedback 
systems. 

FINDING  J:  Field  Measures  of  Training  Effectiveness — Air  Force. 
The  GAO  reported  that  it  considered  sources  of  individual  level 
data  for  field  performance  of  Air  Force  personnel  equivalent  to 
those  it  used  for  the  Navy,  but  concluded  that  neither  the  promo¬ 
tion  examinations  nor  the  supervisory  surveys  were  appropriate. 
The  GAO  further  concluded  no  individual  data  exist  that  would 
allow  an  analysis  equivalent  to  those  performed  by  the  Army  with 
the  Skill  Qualification  Test  data. 

The  GAO  reported-  that  other  Air  Force  training  assessment  proce¬ 
dures  exist,  including  Training  Quality  Reports,  Utilization  and 
Training  Workshops,  and  Occupational  Survey  Reports.  According 
to  the  GAO,  the  Training  Quality  Reports  are  part  of  a  reactive 
evaluation  process,  while  the  other  activities  are  more  concerned 
with  front-end  analysis,  (pp.  4-8  to  4-10/GAO  Draft  Report) 

pod  Response:  Partially  concur.  The  Air  Force  is  aware  of  the 
potential  shortcomings  of  promotion  examinations  and  supervisory 
surveys  for  evaluating  training  effectiveness,  and  is  currently 
developing  career  field  training  management  guidelines  to  track 
and  enhance  the  training  from  enlistment  throughout  an  individ¬ 
ual's  career.  Emphasis  will  be  placed  on  criterion-referenced 
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objectives  rather  than  the  present  code  levels  for  performance 
standards.  These  changes  will  have  a  major  impact  on  the  present 
promotion  system.  To  expedite  feedback  from  supervisors  concern¬ 
ing  any  problems  with  recent  graduates,  a  new  policy  was  recently 
established  by  the  Air  Training  Command  to  provide  telephonic 
communication  on  a  24-hour  basis  between  the  training  center 
providing  the  training  and  the  supervisor  of  the  graduate.  The 
system  allows  more  effective  and  timely  communication  between,  the 
supervisor  and  the  training  provider. 

The  Air  Force  does  not  have  Skill  Qualification  Tests  for  perfor¬ 
mance  and  does  not  plan  to  have  them  in  the  near  future.  Many  of 
the  tasks  performed  in  the  field  are  very  complex.  Testing, 
recording,  and  documenting  individual  performance  for  statistics 
is  very  time  consuming,  requires  additional  manpower,  and  is 
cost-prohibitive.  Further,  many  of  the  new  Air  Force  systems  are 
single  channel  systems,  which  cannot  be  used  for  extensive  train¬ 
ing  or  evaluating  trainees.  All  these  factors  combine  to  make 
the  use  of  hands-on  Skill  Qualification  Tests  an  inappropriate 
solution  to  the  problem  of  training  effectiveness  evaluations. 

The  GAO  finding  that  Occupational  Survey  Reports  are  concerned 
with  front-end  analysis  is  true,  but  information  about  what 
first-termers  are  doing  on-the-job  provides  a  good  basis  for  what 
should  be  trained  and  what  is  expected  in  the  initial  skills 
courses.  As  written  in  the  report,  the  paragraph  gives  a  very 
limited  view  of  what  Occupational  Survey  Reports  provide  the 
training  community  and  their  potential  for  training  assessment. 

FINDING  X:  Alternative  Data  Sources:  The  Job  Performance  Mea¬ 
surement  Project.  The  GAO  reported  a  key  impediment  to  estab¬ 
lishing  a  field  evaluation  component  of  training  assessment  is 
the  expense  of  developing,  testing,  and  administering  measures 
that  validly  and  reliability  measure  actual  performance.  The  GAO 
noted  that,  beginning  in  the  early  eighties,  a  major  effort, 
entitled — "The  Joint-Service  Job  Performance  Measurement 
Project,"  designed  to  address  the  measurement  issues,  has  been 
underway  under  the  direction  of  the  Office  of  Accession  Policy 
located  in  the  Office  of  the  Assistant  Secretary  of  Defense 
(Force  Management  and  Personnel) .  The  GAO  reported  that  this 
project  was  initiated  after  the  Armed  Services  Vocational  Apti¬ 
tude  Battery  unintentionally  allowed  some  300,000  less  qualified 
recruits  into  the  Military  Services  and  resulted  in  field  com¬ 
manders'  complaints  of  quality  degradation  among  their  personnel. 

The  GAO  found  that  the  Joint  Performance  Measurement  project: 

-  did  not  set  out  to  establish  a  link  between  school- 
house  performance  and  field  performance; 
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-  concluded  suitable  measures  of  field  performance  did 
not  exist  and  undertook  to  develop  them; 

-  has  not  reported  any  analyses  of  sex-  and  race-re¬ 
lated  differences,  and  has  not  addressed  the  school- 
house/field  connection;  and 

-  concluded  performance  measures  were  expensive  to 
develop  and  frequently  costly  to  administer  and, 
therefore,  may  not  be  suited  to  more  routine  use  as 
measures  of  training  effectiveness. 

The  GAO  concluded  that  the  investment  made  to  develop  the  perfor¬ 
mance  measures  and  their  surrogates  could  prove  to  be  more  prof¬ 
itable  if  some  of  the  measures  developed  and  the  lessons  learned 
were  more  widely  applied  to  the  development  of  realistic  assess¬ 
ment  procedures  for  training.  The  GAO  further  concluded  that  the 
lack  of  other  objective,  systematically  collected  field  evalua¬ 
tion  data  renders  meaningful  evaluation  of  training  effectiveness 
impossible.  The  GAO  observed  that  decision  makers  in  the  Con¬ 
gress,  the  DoD,  or  the  Services  can  only  react  to  problems  in  the 
field  after  they  have  become  apparent  and  have  been  identified  as 
training-related.  The  GAO  concluded  that,  given  the  cost  and 
complexity  of  today's  military  equipment,  it  is  difficult  to 
understand  the'  lack  of  evaluative  data  to  monitor  how  well  Ser¬ 
vice  personnel  are  being  prepared  to  use  and  maintain  those 
weapons.  Overall,  the  GAO  concluded  that,  among  the  most  serious 
deficiencies  it  identified,  was  the  inability  of  the  Air  Force 
and  the  Navy  to  found  their  evaluation  of  their  selection  proce¬ 
dures  and  schoolhouse  training  in  systematically  collected, 
objective  field  performance  data.  The  GAO  further  concluded 
that,  without  good  performance  measurement  data,  the  Services  are 
not  able  to  maximize  training  effectiveness,  or  even  estimate 
realistically  the  success  of  their  training  investment  in  produc¬ 
ing  skilled  operators  and  maintainers  of  today's  and  tomorrow's 
sophisticated  weaponry,  (pp.  4-10  to  5-4/GAO  Draft  Report) 

DoD  Response:  Partially,  concur.  The  GAO  analysis  of  the  back¬ 
ground,  purposes,  and  findings  thus  far  from  the  Joint-Service 
Job  Performance  Measurement  Program  are  generally  accurate.  The 
GAO  has  also  correctly  identified  that  hands-on  performance 
measures  are  resource-intensive  in  terms  of  labor,  cost,  time, 
and  equipment,  which  limits  their  value  for  routine  use  as  field 
measures  of  training  effectiveness.  The  issue  of  applying  job 
performance  measurement  technology  to  training  was  investigated 
in  May  1985,  when  the  Assistant  Secretary  of  Defense  (Manpower, 
Installations,  s  Logistics)  solicited  Service  responses  to  an 
inquiry  from  Congressman  Les  Aspin,  Chairman  of  the  House  Commit¬ 
tee  on  Armed  Services.  One  of  the  Chairman's  questions  specifi¬ 
cally  asked  about  Service  plans  for  applying  job  performance  data 
to  training  course  design  and  evaluation.  The  Service  responses 
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suggested  how  they  anticipated  potential  applications  of  job 
performance  measurement  data.  Each  of  the  Services  offered  a 
plan  for  institutionalization  of  job  performance  measures  and 
they  identified  training  evaluation  as  a  likely  additional  appli¬ 
cation  of  Job  Performance  Measurement  technology,  to  include 
introducing  performance  measurement  into  the  training  feedback 
system.  The  resource  factors  identified  by  the  GAO,  coupled  with 
the  need  to  wait  until  completion  of  the  enlistment  standards 
setting  portion  of  the  Job  Performance  Measurement 
research,  resulted  in  the  decision  to  defer  full-scale  implemen¬ 
tation  of  routine  job  performance  data  collection  for  all  occupa¬ 
tions. 

It  should  be  noted  there  is  Service  work  ongoing  that  examines 
the  link  between  schoolhouse  performance  and  field  performance. 
For  example,  the  Army's  Selection  and  classification  research 
program  (which  incorporates  the  Army' s  contribution  to  the  Joint- 
Service  Job  Performance  Measurement  Project)  is  examining  the 
link  between  schoolhouse  performance  and  job  performance. 
Schoolhouse  (end-of-training)  and  job  performance  measures  have 
been  developed  and  administered  to  a  longitudinal  sample  in 
several  military  occupational  specialties.  In  addition,  school 
grades  and  Skill  Qualification  Test  scores  have  been  obtained  for 
the  sample  and  analyses  are  underway.  The  Air  Force,  Navy,  and 
Marine  Corps  have  been  performing  similar  analyses  and  the 
results  will  be  applicable  to  understanding  the  link  between 
schoolhouse  performance  and  on-the-job  performance. 

Work  is  also  underway  in  all  of  the  Services  to  determine  the 
efficacy  of  performance  surrogates  for  specific  purposes.  There 
are  technical  and  policy  differences  related  to  measuring  job 
performance  for  validating  a  test  and  measuring  job  performance 
for  evaluating  a  training  system.  Nevertheless,  if  research 
efforts  are  successful,  it  may  be  possible  to  use  surrogates  to 
develop  cost-effective  field  performance  feedback  procedures  that 
could  help  guide  curriculum  development. 


RECOMMENDATIONS 


RECOMMENDATION  1:  The  GAO  recommended  that  the  Assistant  Secre¬ 
tary  of  Defense  (Force  Management  and  Personnel)  direct  the 
personnel  research  it  coordinates  among  the  individual  Services 
to  investigate  more  sensitive  predictors  of  schoolhouse  perfor¬ 
mance  for  women  and  minority  students  from  the  Armed  Services 
Vocational  Aptitude  Battery  data  it  already  possesses. 

(p.  5-4/GAO  Draft  Report) 
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DoD  Response:  Concur.  The  Office  of  che  Assistant  Secretary  of 
Defense  (Force  Management  and  Personnel)  will  prepare  a  memoran¬ 
dum  to  the  Defense  Manpower  Data  Center  and  the  Services  request¬ 
ing  that  the  recommended  analyses  be  performed.  We  will  also 
ensure  that  research  in  progress  pertaining  to  computerized 
enlistment  test  development  will  include  analyses  to  determine 
the  sensitivity  of  the  tests  as  predictors  of  schoolhouse  perfor¬ 
mance  for  gender  and  minority  subgroups. 

RECOMMENDATION  2:  The  GAO  recommended  that  the  Secretary  of  the 
Army  direct  the  Training  and  Doctrine  Command  ro  review  the 
schoolhouse  grading  procedures  identified  within  the  report  as 
deficient  for  their  accuracy,  appropriateness,  and  reliability. 

(p.  5-4 /GAO  Draft  Report) 

DoD  Response:  Concur.  The  Secretary  of  the  Army  will  direct  the 
Training  and  Doctrine  Command  to  review  the  appropriateness  of 
Fort  Gordon's  testing  procedures  and  their  compliance  with  Army 
policy.  A  plan  of  action  to  remedy  any  existing  deficiencies 
will  be  prepared  by  August  1990. 

recommendation  3:  The  GAO  recommended  that  the  Secretary  of  the 
Navy  establish  a  firm  deadline  for  developing  a  training  evalua¬ 
tion  program  and  that  he  direct  that  the  adequacy  of  current 
resources  allocated  to  this  effort  be  reexamined,  (p.  5-4/GAO 
Draft  Report) 

DoD  Response:  Concur.  The  Navy  has  several  training  evaluation 
programs  already  in  place.  As  mentioned  previously,  these 
include  the  Navy  Training  Appraisal,  che  Navy  Training  Require¬ 
ments  Review,  the  Fleet  Training  Appraisal  Program,  the  Mainte¬ 
nance  Training  Improvement  Program  and  the  Training  Performance 
Evaluation  Board.  Additionally,  the  Chief  of  Naval  Education  and 
Training  plans  to  brief,  by  July  1990,  an  enhanced  integrated 
training  feedback  system  to  the  Chief  of  Naval  Personnel.  A  Plan 
of  Action  and  Milestones  will  be  prepared  by  August  of  1990  to 
implement  that  system. 

RECOMMENDATION  4:  The  GAO  recommended  that  che  Assistant  Secre¬ 
tary  of  Defense  (Force  Management  and  Personnel)  review  alterna¬ 
tive  measures  of  field  performance  already  developed  by  the 
Services  under  the  Job  Performance  Measurement  project  for  poten¬ 
tial  applicability  to  training  and  on-the-job  performance  evalua¬ 
tion.  (pp.  5-4  and  5-5/GAO  Draft  Report) 

DoD  Response:  Concur.  During  the  mid-1980s,  the  DoD  explored 
applications  of  the  measures  developed  in  the  Joint-Service  Job 
Performance  Measurement  Program  to  training.  While  the  decision 
made  following  that  review  was  to  defer  full-scale  implementation 
because  of  cost  factors  and  the  fact  that  techniques  for  develop- 
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